Troubleshooting Oracle Stream Analytics

5 Troubleshooting Oracle Stream Analytics

Oracle Stream Analytics provides the capability to analyze and monitor stream of events in real time. User can create pipelines to model the stream processing solutions. Typically Oracle Stream Analytics pipelines run continuously for long duration until killed explicitly or fails. In this guide, we will explain steps to troubleshoot various issues which you can encounter during pipeline creation, execution or deployment time.

Pipeline Monitoring and Debug Metrics

For every running Oracle Stream Analytics pipeline, there is a corresponding spark application deployed on Spark Cluster. If Oracle Stream Analytics pipeline is deployed and running in draft mode or published mode, user can monitor and analyze the corresponding Spark application using real-time metrics provided by Oracle Stream Analytics. These metrics provide detailed run-time insights for every pipeline stage and user can drill down to operator level details. Note that these metrics are in addition to metrics provided by Spark.

For each application, Oracle Stream Analytics provides detailed monitoring metrics which user can use to analyze if pipeline is not working as per expectation.

To access monitoring and debug metrics:

Open Spark Master UI.
You can obtain Spark Master URL from the System Settings. The Spark Master page displays a list of running applications. Each entry in the list of application contains details about application such as application ID, name, owner, current status etc.
Open Application Master Page.
Click the link with the caption ApplicationMaster to open the Application Details page. Ensure that you click the link for the application corresponding to your pipeline. For more details see Determine the Spark Application Name Corresponding to a Pipeline.
Click the Pipeline tab. This is Pipeline Summary Page having details of all stages inside an Oracle Stream Analytics Pipeline.
This page has following information about a pipeline:
- Pipeline ID: Unique pipeline id in Spark Cluster
- Pipeline Name: Name of Oracle Stream Analytics Pipeline given by user in Oracle Stream Analytics UI.
- Pipeline Stages: This section displays a table having detailed runtime metrics of each pipeline stage. Each entry in the table is corresponding to a pipeline stage in Oracle Stream Analytics UI pipeline graph.
- Pipeline DAG: This is a visual representation of all stages in form of a DAG where it displays the parent-child relation various pipeline stages. This diagram also shows the information about the transformations which is being done in each of the pipeline stage. click on any Stage ID entry in the Pipeline Stage table to open Pipeline Stage Details Page.
- Total Number of Transformations: This measurement is number of spark transformations applied to compute each stage.
  
  Oracle Stream Analytics supports various types of stages e.g. Query Stage, Pattern Stage, Custom Stage etc. For each pipeline stage, Oracle Stream Analytics defines a list of transformation which will be applied on the input stream for the stage. The output from final transformation will be the output of stage.
- Total Output Partitions: This measurement is total number of partitions in the output stream of each stage.
  
  Every pipeline stage has its own partitioning requirements which are determined from stage configurations. For example, If a QUERY stage defines a summary function and doesn't define a group-by column, then QUERY stage will have only one partition because there will be no partitioning criteria available.
- Total Output Events: This measurement is total number of output events (not micro-batches) emitted by each stage.
- Average Output Rate: This measurement is the rate at which each stage has emitted output events so far. The rate is a ratio of total number of output events so far and total application execution time.
  
  If the rate is ZERO, then it doesn't always mean that there is ERROR in stage processing. Sometime stage doesn't output any record at all (e.g. No event passed the Filter in Query stage).This can happen if rate of output events is less than 1 events/second.
- Current Output Rate: This measurement is the rate at which each stage is emitting output events. The rate is ratio of total number of output events and total application execution time since last metrics page refresh. To get better picture of current output rate, please refresh the page more frequently.
The Stage Summary page provides details about all the transformations in a stage. This page has following information:
- Pipeline ID: Unique pipeline id in Spark Cluster
- Pipeline Name: Name of Oracle Stream Analytics Pipeline given by user in Oracle Stream Analytics UI.
- Stage ID: Unique stage id in DAG of stages for Oracle Stream Analytics Pipeline.
- Stage Transformations: This section displays a table having details about all transformations being performed in each pipeline stage. Each entry in the table is corresponding to a transformation operation used in computation of the stage. You can observe that final transformation in every stage is MonitorDStream. The reason is that MonitorDStream pipes output of stage to Oracle Stream Analytics UI Live Table.
- Stage DAG: This is a visual representation of all transformations in form of a DAG where it displays the parent-child relation various pipeline transformations.
  
  Note that OfferByAge is a RULE stage in Oracle Stream Analytics pipeline. To compute this stage, Oracle Stream Analytics is computing a CQLDStream transformation. Inside CQLDStream transformation, the input data is transformed using a continuously running query (CQL) in a CQL Engine. Note that there will be one CQL engine associated with one Executor.
Click on CQLDStream to open CQL Engine Summary page corresponding to the transformation.
- Transformation Name: Name of output DStream for the transformation. Every transformation in spark results into an output dstream.
- Transformation Type: This is category information of each transformation being used in the stage execution. If transformation type is "Oracle", it is based on Oracle's proprietary transformation algorithm. If the transformation type is "Native", then transformation is provided by Apache Spark implementation.
The CQL Engine Summary page has details about all the transformations in a stage. This page has following information:
- Pipeline ID: Unique pipeline id in Spark Cluster
- Pipeline Name: Name of Oracle Stream Analytics Pipeline given by user in Oracle Stream Analytics UI.
- Stage ID: Unique stage id in DAG of stages for Oracle Stream Analytics Pipeline.
- Running Queries: This section displays list of CQL queries running to compute the CQL transformation for a stage. This table displays a system-generated Query ID and Query Text. Check Oracle Continuous Query Language Reference for CQL Query syntax and semantics. To see more details about query, click on the query id hyperlink in the table entry to open CQL Engine Query Details page.
- Registered Sources: This section displays internal CQL metadata about all the input sources which the query is based upon. For every input stream of the stage, there will be one entry in this table.
  
  Each entry contains source name, source type, timestamp type and stream attributes. Timestamp type can be PROCESSING or EVENT timestamped. If stream is PROCESSING timestamped, then timestamp of each event will be defined by system. If stream is EVENT timestamped, then timestamp of each event is defined by one of the stream attribute itself. A source can be Stream or Relation.
- External Sources: This section displays details about all external sources with which input stream is joined. The external source can be a database table or coherence cache.
- CQL Engines: This section displays a table having details about all instances of CQL engines used by the pipeline. Here are details about each field of the table:
  - CQLEngine Id: System generated id for a CQL engine instance.
  - ExecutorId: Executor Id with which the CQL engine is associated.
  - Executor Host: Address of the cluster node on which this CQL engine is running.
  - Status: Current Status of CQL Engine. Status can be either ACTIVE or INACTIVE. If it is ACTIVE, it means that CQL Engine instance is up and running, Otherwise CQL Engine is killed.
This page contain details about execution and HA statistics for the query corresponding to a Oracle Stream Analytics pipeline stage. Following are the details displayed on this page:
- Query ID: System generated identifier for query
- Query Text: Query String
- Num Partitions: This fields shows the degree of parallelism of the query. The degree of parallelism is defined by total number of input partitions processed by a query.
  
  Degree of parallelism depends on the many factors such as query construcsts, number of input kafka partitions and number of executors assigned to application.
- Execution Statistics: This section shows the detailed execution statistics of each operator.
  - Partition ID: Partition Sequence Id
  - CQL Engine ID: Sequence ID of CQL Engine on which the partition is being processed.
  - Total Output Events: Number of output events emitted by CQL query for each partition.
  - Total Output Heartbeats: Number of heartbeat events emitted by CQL query for each partition. Please note that heartbeats are special events which ensures timestamp progression in Oracle Stream Analytics pipeline.
  - Throughput: Ratio of total number of events processed and total time spent in processing for each partition.
  - Latency: Average turnaround time taken to process a partition of stream.
- HA Statistics: This table shows the real-time statistics about query's HA operations. Note that unit of time is in MILLISECONDS.
  - Partition ID: Partition Sequence ID
  - CQL Engine ID: Sequence ID of CQL Engine on which the partition is being processed.
  - Total Full Snapshots Created: Total number of times the full state of query is serialized and saved.
  - Avg Full Snapshot Creation Time: Average time spent in serializing and saving the full state of query.
  - Total Full Snapshots Loaded: Total number of times the full state of query is de-serialized and loaded in query plan.
  - Avg Full Snapshot Load Time: Average time spent in de-serializing and loading the full state of query.
  - Total Journal Snapshots Created: Total number of times the journaled state of query is serialized and saved.
  - Avg Journal Snapshot Creation Time: Average time spent in serializing and saving the journaled state of query.
  - Total Journal Snapshots Loaded: Total number of times the journaled state of query is de-serialized and loaded in query plan.
  - Avg Journal Snapshot Load Time: Average time spent in de-serializing and loading the journaled state of query.
    
    Full Snapshot is the complete state of query. The query state represent the internal data structure and state of each operator in query plan. Journal snapshot is partial and incremental snapshot having a start time and end time. Oracle Stream Analytics optimizes the state preservation by using Journal snapshot if possible.
This page contains details about each execution operator of CQL query for a particular partition of a stage in pipeline.
- Query ID: System generated identifier for query
- Query Text: Query String
- Partition ID: All operator details are corresponding to this partition id.
- Operator Statistics:
  - Operator ID: System Generated Identifiers
  - Total Input Events: Total number of input events received by each operator.
  - Total Output Events: Total number of output events generated by each operator.
  - Total Input Heartbeats: Total number of heartbeat events received by each operator.
  - Total Output Heartbeats: Total number of heartbeat events generated by each operator.
  - Throughput(events/second): Ratio of total input events processed and total time spent in processing for each operator.
  - Latency(ms): Total turnaround time to process an event for each operator.
- Operator DAG: This is visual representation of the query plan. The DAG will show the parent-child details for each operator. You can further drill down the execution statistics of operator. Please click on the operator which will open CQL Operator Details Page.
This page contains few additional information about each execution operator apart from the CQL Engine Query Details page provides all essential metrics for each operator.

Common Issues and Remedies

This section provides a comprehensive list of issues or questions categorized by the catalog item. While working on a particular catalog item, if you face any difficulties then refer to the list of common issues or questions corresponding to that catalog item.

Topics

Pipeline
Stream
Reference
Geofence
Connection
Target
Dashboard
Cube
Logs

Pipeline

Common issues encountered with pipelines are listed in this section.

Ensure that Pipeline is Deployed Successfully

You can deploy pipelines to any Spark Cluster (version 1.6).

Follow the steps in the below sections to verify that the pipeline is deployed and running successfully on Spark cluster.

Verify Pipeline Deployment on Oracle Big Data Cloud Service - Compute Edition based Spark Cluster

Perform these steps if you are subscribes to Oracle Big Data Cloud Service.

Go to PSM user interface and open the home page for Oracle Big Data Cloud Service (BDCSCE) instance.
Click on the hamburger menu next to instance name and then click Big Data Cluster Console.

Description of the illustration big_data_cluster_home.png
Enter the login credentials and open the Big Data Cluster Console home page.
Navigate to Jobs tab.

You can see a list of jobs. Each job corresponds to a spark pipeline running on your BDCSCE cluster.

Description of the illustration jobs_logs.png
Find the entry corresponding to your pipeline and check the status. For more information, see Determine the Spark Application Name Corresponding to a Pipeline.

If you see the status as Running, then the pipeline is currently deployed and running successfully.
Click the hamburger menu corresponding to the required job to fetch logs and click Logs to get container wise logs.

You can download these files for further debugging.

Verify Pipeline Deployment on Apache Spark Installation based Spark Cluster

Open Spark Master user interface.

Description of the illustration spark_master_ui.png
Find the entry corresponding to your pipeline and check the status. For more information, see Determine the Spark Application Name Corresponding to a Pipeline.

If you see the status as Running, then the pipeline is currently deployed and running successfully.

Ensure that the Input Stream is Supplying Continuous Stream of Events to the Pipeline

You must have a continuous supply of events from the input stream.

Go to the Catalog.
Locate and click the stream you want to troubleshoot.
Check the value of the topicName property under the Source Type Parameters section.
Listen to the Kafka topic where the input stream for the pipeline is received.

Since this topic is created using Kafka APIs, you cannot consume this topic with REST APIs.
1. Listen to the Kafka topic hosted on Oracle Event Hub Cloud Service. You must use Apache Kafka utilities or any other relevant tool to listed to the topic.
  
  Follow these steps to listen to Kafka topic:
  1. Determine the Zookeeper Address. — Go to Oracle Event Hub Cloud Service Platform home page. Find the IP Address of Zookeeper.
  2. Use following command to listen the Kafka topic:
    ./kafka-console-consumer.sh --zookeeper IPAddress:2181 --topic nano
2. Listen to the Kafka topic hosted on a standard Apache Kafka installation.
  
  You can listen to the Kafka topic using utilities from a Kafka Installation. kafka-console-consumer.sh is a utility script available as part of any Kafka installation.
  
  Follow these steps to listen to Kafka topic:
  1. Determine the Zookeeper Address from Apache Kafka Installation based Cluster.
  2. Use the following command to listen the Kafka topic:
    ./kafka-console-consumer.sh --zookeeper IPAddress:2181 --topic nano

Determine the Spark Application Name Corresponding to a Pipeline

You can perform the following steps to check if the output stream is available in monitor topic.

Navigate to Catalog.
Open the required pipeline.
Ensure that you stay in pipeline editor and do not click Done. Otherwise the pipeline gets undeployed.
Right-click anywhere in the browser and select Inspect.
Go to WS tab under the Network tab.
Refresh the browser.

New websocket connections are created.
Locate a websocket whose URL has a parameter with the name topic.

The value of topic param is the name of Kafka Topic where the output of this stage (query or pattern) is pushed.

Description of the illustration ws_network.png

The topic name is AppName_StageId. The pipeline name can be derived from topic name by removing the _StageID from topic name. In the above snapshot, the pipeline name is sx_2_49_12_pipe1_draft.

Ensure that Caching is Working if a Pipeline is Correlating a Stream with Reference

Perform the following steps to ensure caching is leveraged in the pipeline:

Go to Spark application master UI.
Open Pipeline Summary Page after clicking on Pipeline Tab. The pipeline summary page shows a table of stages with various metrics.
Click on the stage id corresponding to the Query stage of the pipeline in which you're correlating stream with reference.
In the Pipeline Stage Details page, click the CQLDStream stage to open CQL Engine Summary Details page.
In the CQL Engine Summary Details page, locate the External Sources section. Note the source id for the reference which is used in stage.
Open the CQL Engine Detailed Query Analysis page.
Click the operator that has the operator id same as source id to open CQL Engine Detailed Operator Analysis page. Click the entry corresponding to the operator.
Look for the Cache Statistics section. If there is no such section, then caching is not enabled for the stage. If you see the non-zero entries in Cache Hits and Cache Misses, then caching is enabled and active for the pipeline stage.

Live Table Shows `Listening Events` with No Events in the Table

There can be multiple reasons why status of pipeline has not changed to Listening Eventsfrom Starting Pipeline. Following are the steps to troubleshoot this scenario:

The live table shows output events of only stage which is currently selected. At any moment of time, there will be one stage which will be selected. Try switching to different stage. If you observe output in live table for another stage, the problem can be associated with the stage only. To debug further, go to step 5.

If there is no output in any stage, then move to step 2.
Make sure that pipeline is still running on Spark Cluster. For more details, see Ensure that Pipeline is Deployed Successfully.
If the spark application for your pipeline is killed or aborted, then it suggests that pipeline has crashed. To troubleshoot further, you may need to look into application logs.
If application is in ACCEPTED, NEW or SUBMITTED state, then application is waiting for cluster resource and not yet started. If there are not enough resources, check the number of VCORES in Big Data Cloud Service Spark Yarn cluster. For a pipeline, Stream Analytics requires minimum 3 VCORES.
If application is in RUNNING state, use the following steps to troubleshoot further:
1. Make sure that input stream is pushing events continuously in pipeline.
2. If input stream is pushing events, make sure that the each of the pipeline stages is processing events and emitting outputs.
3. If both of the above steps are verified successfully, then make sure that application is able to push output events of each stage to its corresponding monitor topic. To achieve this:
  1. Determine the monitor topic for the stage where the output of stage is being pushed into using instructions provided in Determine the Topic Name where Output of Pipeline Stage is Propagated.
  2. Listen to the monitor topic and ensure that the events are continuously being pushed in topic. To listen the Kafka topic, you must have access to Kafka cluster where topic is created. You can listen to the Kafka topic using utilities from a Kafka Installation. kafka-console-consumer.sh is a utility script available as part of any Kafka installation.
  3. If you don't see any events in the topic, then this can be issue related to writing output events from stage to monitor topic. Check the server logs and look for any exception and then report to the administrator.
  4. If you can see outputs events in monitor topic, then the issue can be related to reading output events in web browser.

Determine the Topic Name where Output of Pipeline Stage is Propagated

The topic name for pipeline stage output will be of format PipelineID_StageID. A sample topic name is sx_TestPipeline_5226CDCB_2F11_4A46_8987_4BFCE6FED617_K60bGBKF_draft_st040800FD_A5B3_4AC2_90D5_ED85EB528F41. In this string, the pipeline name is sx_TestPipeline_5226CDCB_2F11_4A46_8987_4BFCE6FED617_K60bGBKF_draft and the stage id is st040800FD_A5B3_4AC2_90D5_ED85EB528F41.

Here are the steps to find the topic name for a stage:

Open the Pipeline Summary Page for your pipeline. If you don't know the corresponding application name for this pipeline, see Determine the Spark Application Name Corresponding to a Pipeline for instructions.
This page will provide the Pipeline Name and various stage ids. For every pipeline stage, you will see an entry in the table.
For every stage, the output topic id will be PipelineName_StageID.
Click Done in the pipeline editor and then go back to Catalog and open the pipeline again.

Live Table Still Shows `Starting Pipeline`

There can be multiple reasons why status of pipeline has not changed to Listening Eventsfrom Starting Pipeline. Following are the steps to troubleshoot this scenario:

Ensure that pipeline has been successfully deployed to Spark Cluster. For more information, see Ensure that Pipeline is Deployed Successfully. Also make sure that the Spark cluster is not down and is available.
If the deployment failed, check the Jetty logs to see the exceptions related to the deployment failure and fix the issues.
If the deployment is successful, verify that OSA webtier has received the pipeline deployment from Spark.
Click Done in the pipeline editor and then go back to Catalog and open the pipeline again.

Ensure that a Pipeline Stage is Still Processing Events

Perform the below steps to verify whether a particular pipeline stage is still processing events:

Go to Spark application master UI.
Open Pipeline Summary Page after clicking on Pipeline Tab. The pipeline summary page shows a table of stages with various metrics.
Check if Total Output Events is a non-zero value. Refresh the page to see if the value increases or stays the same. If the value remains same and doesn't change for long time, then drill down into stage details.

Stream

Common issues encountered with streams are listed in this section.

Cannot Find MapMessage Format

In this release, only CSV and JSON formats are supported for FILE stream. For Kafka streams, only CSV, JSON and AVRO formats are supported.

Cannot Find AVRO Format

In this release, only CSV and JSON formats are supported for streams (FILE stream type). Hence, you cannot find the AVRO format.

Cannot See Any Kafka Topic or a Specific Topic in the List of Topics

Use the following steps to troubleshoot:

Go to Catalog and select the Kafka connection which you are using to create the stream.
Click Next to go to Connection Details tab.
Click Test Connection to verify that the connection is still active.
Ensure that topic or topics exist in Kafka cluster. You must have access to Kafka cluster where topic is created. You can list all the Kafka topics using utilities from a Kafka Installation. kafka-console-consumer.sh is a utility script available as part of any Kafka installation.
If you can't see any topic using above command, ensure that you create the topic.
If the test connection failed, and you see error message like OSA-01266 Failed to connect to the ZooKeeper server", then the Kafka cluster is not reachable. Ensure that Kafka cluster is up and running.

Input Kafka Topic is Sending Data but No Events Seen in Live Table

This can happen if the incoming events are not adhered to the expected shape for the Stream. To check if the events are dropped due to shape mismatch, use the following steps to troubleshoot:

Check if lenient parameter under Source Type Properties for the Stream. If it is FALSE, then the event may have been dropped due to shape mismatch. To confirm this, check the application logs for the running application.
If the property is set to TRUE, debug further:
1. Make sure that Kafka Cluster is up and running. Spark cluster should be able to access Kafka cluster.
2. If Kafka cluster is up and running, Please obtain the application logs for further debugging.

Connection

Common issues encountered with connections are listed in this section.

Ensure that Connection is Active

Kafka Connection

If your connection is a Kafka Connection, then use the following steps:

Go to Catalog and select the Kafka connection which you want to test.
Click Next to go to "Connection Details" tab.
Click Test Connection. If the test is successful, it indicates that the connection is active.

Database Connection

If your connection is a database Connection, then use the following steps:

Go to Catalog and select the database connection which you want to test.
Click Next to go to "Connection Details" tab.
Click Test Connection. If the test is successful, it indicates that the connection is active.
If the test fails, you'll see an error message to indicate that connection can't be made to database. To resolve the issue:
- OSA-01260 Failed to connect to the database. IO Error: The Network Adapter could not establish the connection — indicates that that the DB host is not reachable from OSA design time
- OSA-01260 Failed to connect to the database. ORA-01017: invalid username/password; logon denied — indicates that the credentials are incorrect and you can't access database using these credentials

Coherence Connection

If your connection is a coherence Connection, then use the following steps:

OSA doesn't provide capability to Test connection for a coherence cluster. Refer to Oracle Coherence documentation to find utilities and tools to achieve that.

Druid Connection

If your connection is a druid Connection, then use the following steps:

Go to Catalog and select the database connection which you want to test.
Click Next to go to "Connection Details" tab.
Click Test Connection. If the test is successful, it indicates that the connection is active.
If you see error message like OSA-01460 Failed to connect to the druid services at zooKeeper server., then the druid zookeeper is not reachable. Make sure that druid services and zookeeper cluster are up and running.

JNDI Connection

If your connection is a JNDI Connection, then use the following steps:

Go to Catalog and select the database connection which you want to test.
Click Next to go to "Connection Details" tab.
Click Test Connection. If the test is successful, it indicates that the connection is active.
If you see error message like OSA-01707 Communication with server failed. Ensure that server is up and server url(s) are specified correctly. then either server is down or server url(s) is incorrectly specified. Server url can only be specified in format host1:port1,host2:port2 and so on.
If you see error message like OSA-01706 JNDI connection failed. User: weblogic, failed to be authenticated then ensure username and password is specified correctly.

Reference

Common issues encountered with references are listed in this section.

Ensure that a Reference is Cached

If a reference is of type COHERENCE, then caching is not supported as the source itself is a cache.

If a reference is of type DATABASE, then use following steps to check if the reference is cached:

Go to Catalog and open the required Reference.
Verify that the Enable Caching property in Source Type Parameters section is set to true. If it is TRUE, caching is enabled. If not, edit the reference and enable the property.

Target

Common issues encountered with targets are listed in this section.

Cannot Find AVRO and MapMessage Formats for REST Target

In this release, only CSV and JSON formats are supported for REST targets.

Cannot Find MapMessage Format for Kafka Target

In this release, only CSV, JSON and AVRO formats are supported for Kafka targets.

Cannot See Any Events in REST Endpoint for REST Target

If pipeline is in draft mode, then OSA doesn't push events to REST endpoint. Only a published application pushes the events to REST target.

Cannot See Any Events in the Target for Kafka Topic

If a pipeline is in draft mode, then OSA doesn't push events to Kafka Topic. Only a published application pushes the events to output Kafka target.

Geofence

Common issues encountered with geofences are listed in this section.

Ensure that Name and Description for Geo Fence is Set Correctly

If name and description fields are not displayed for database-based geofence, ensure to follow steps mentioned below:

Go to Catalog and click Edit for the required database-based geo fence.
Click Edit for Source Type Properties and then Next.
Ensure that mapping for Name and Description is defined in Shape section.
Once these mappings are defined, you can see name and description for geo fence.

Ensure that DB-based Geofence is Working

To ensure that a database-based geofence is working:

Go to Catalog and open the required database-based geo fence.
Ensure that the connection used in geo fence is active by clicking test button in database connection wizard.
Ensure that table used in geo fence is still valid and exists in DB.
Go to the geo fence page and verify that the issue is resolved.

Cube

Common issues encountered with cubes are listed in this section.

Unable to Explore Cube which was Working Earlier

If you are unable to explore a cube which was working earlier, follow the steps mentioned below:

Check if the druid zookeeper or the associate services for indexer, broker, middle manager or overlord is down.
Click the Druid connection and navigate to next page.
Test the connection. This step will tell you if the services are down and need to be looked into.

Cube Displays "Datasource not Ready"

If you keep seeing “Datasource not Ready” message when you explore a cube, follow the steps mentioned below:

Go to the druid indexer logs. Generally, it is http:/DRUID_HOST:3090/console.html.
Look for entry in running tasks index_kafka_<cube-name>_<somehash>. If there is no entry in running tasks, look for the same entry in pending tasks or completed tasks.
If the entry lies in pending tasks, it means that workers are running out of capacity and datasource will get picked for indexing as soon as its available.
In such cases, either wait OR increase the worker capacity and restart druid services OR kill some existing datasource indexing tasks (they will get started again after sometime).
If the entry lies in completed tasks with "FAILED" status, it means that indexing failed either due to incorrect ingestion spec or due to resource issue.
You can find the exact reason by clicking "log (all)" link and navigating to the exception.
If it is due to ingestion, try changing the timestamp format. (Druid fails to index, if the timestamp is not in JODA timeformat OR if the timeformat specified does not match with format of timestamp value).

Dashboard

Common issues encountered with dashboards are listed in this section.

Visualizations Appearing Earlier are No Longer Available in Dashboard

Use the following steps to troubleshoot:

For missing streaming visualization, it might be due to the following reasons:
1. Corresponding pipeline/stage for the missing visualizations no longer exists
2. The visualization itself is removed from catalog or the pipeline editor
For missing exploration visualization (created from cube), it might happen as cube or visualization might have been deleted already.

Dashboard Layout Reset after You Resized/moved the Visualizations

Use the following steps to troubleshoot:

This might happen, if user forgets to save the dashboard after movement/resizing of visualizations.
Make sure to click Save after changing the layout.

Streaming Visualizations Do not Show Any Data

Use the following steps to troubleshoot:

Go to visualization in pipeline editor and make sure that live output table is displaying data.
If there is no output, make sure that the pipeline is deployed and running on cluster. Once you have the live output table displaying data, it should show up on streaming visualization.

Troubleshoot Live Output

For every pipeline, there will be one Spark streaming pipeline running on Spark Cluster. If a Stream Analytics pipeline uses one or more Query Stage or Pattern Stage, then the pipeline will run one or more continuous query for each of these stages.

For more information about continuous query, see Understanding Oracle CQL.

If there are no output events in Live Output Table for Query Stage or Pattern Stage, use the following steps to determine or narrow down the problem:

Ensure that CQL Queries for Each Query Stage Emit Output

Check if the CQL queries are emitting output events to monitor CQL Queries using CQL Engine Metrics.

Follow these steps to check the output events:

Open CQL Engine Query Details page. For more information, see Access CQL Engine Metrics.
Check that at least one partition has Total Output Events greater than zero under the Execution Statistics section.

Description of the illustration cql_engine_query_details.png

If your query is running without any error and input data is continuously coming, then the Total Output Events will keep rising.

Ensure that the Output of Stage is Available

One of the essential things required to troubleshoot a pipeline is to ensure that the output of stage is available in monitor topic.

Follow these steps to check if the output stream is available in the monitor topic:

Ensure that you stay in the pipeline Editor and don’t click Done. Else, the pipeline will be undeployed.
Right-click anywhere in the browser and click Inspect.
Select Network from the top tab and then select WS.
Refresh the browser.

New websocket connections are created.
Locate a websocket whose URL has a parameter with name topic.

The value of the topic param is the name of the Kafka topic where the output of this stage is pushed.

Description of the illustration websocket_network.png
Listen to the Kafka topic where output of the stage is being pushed.

Since this topic is created using Kafka APIs, you cannot consume this topic with REST APIs. Follow these steps to listen to the Kafka topic:
1. Listen to the Kafka topic hosted on Oracle Event Hub Cloud Service. You must use Apache Kafka utilities or any other relevant tool to listed to the topic.
  
  Follow these steps to listen to Kafka topic:
  1. Determine the Zookeeper Address. — Go to Oracle Event Hub Cloud Service Platform home page. Find the IP Address of Zookeeper.
  2. Use following command to listen the Kafka topic:
    ./kafka-console-consumer.sh --zookeeper IPAddress:2181 --topic sx_2_49_12_pipe1_draft_st60
2. Listen to the Kafka topic hosted on a standard Apache Kafka installation.
  
  You can listen to the Kafka topic using utilities from a Kafka Installation. kafka-console-consumer.sh is a utility script available as part of any Kafka installation.
  
  Follow these steps to listen to Kafka topic:
  1. Determine the Zookeeper Address from Apache Kafka Installation based Cluster.
  2. Use following command to listen the Kafka topic:
    ./kafka-console-consumer.sh --zookeeper IPAddress:2181 --topic sx_2_49_12_pipe1_draft_st60

Access CQL Engine Metrics

When a pipeline with a query or pattern stage is deployed to a Spark cluster, you can perform the complex event processing using a set of CQL Engine Metrics running inside the spark cluster.

Use CQL queries which can do aggregate, correlate, filter, and pattern matching over a stream of events. Spark provides an out-of-the-box pipeline UI (commonly running on <host>:4040) that can help users to monitor a running Spark Streaming pipeline. As CQL queries also run as part of Spark Streaming pipeline, the Spark pipeline UI is extended to include monitoring capabilities of CQL queries.

To access CQL Engine metrics:

Create a pipeline with at least one query or pattern stage.
Navigate to Spark Master User Interface.

Description of the illustration application_master.png
Click the CQL Engine tab.

Description of the illustration cql_engine.png

You can see the details of all queries running inside a Spark CQL pipeline. This page also shows various streams/relations and external relations registered as part of the pipeline.
Click any query to see the details of that query. the query details page shows partition-wise details about a particular running query.
Click the specific partition link to determine further details about query plan and operator level details. This page shows the operator level details of a query processing a particular partition.

Description of the illustration cql_engine_detailed_analysis.png

Troubleshoot Pipeline Deployment

Sometimes pipeline deployment fails with the following exception:

Spark pipeline did not start successfully after 60000 ms.

This exception usually occurs when you do not have free resources on your cluster.

Workaround:

Use external Spark cluster or get better machine and configure the cluster with more resources.

Pipeline Monitoring and Debug Metrics

Common Issues and Remedies

Pipeline

Ensure that Pipeline is Deployed Successfully

Ensure that the Input Stream is Supplying Continuous Stream of Events to the Pipeline

Determine the Spark Application Name Corresponding to a Pipeline

Ensure that Caching is Working if a Pipeline is Correlating a Stream with Reference

Live Table Shows Listening Events with No Events in the Table

Determine the Topic Name where Output of Pipeline Stage is Propagated

Live Table Still Shows Starting Pipeline

Ensure that a Pipeline Stage is Still Processing Events

Stream

Cannot Find MapMessage Format

Cannot Find AVRO Format

Cannot See Any Kafka Topic or a Specific Topic in the List of Topics

Input Kafka Topic is Sending Data but No Events Seen in Live Table

Connection

Ensure that Connection is Active

Reference

Ensure that a Reference is Cached

Target

Cannot Find AVRO and MapMessage Formats for REST Target

Cannot Find MapMessage Format for Kafka Target

Cannot See Any Events in REST Endpoint for REST Target

Cannot See Any Events in the Target for Kafka Topic

Geofence

Ensure that Name and Description for Geo Fence is Set Correctly

Ensure that DB-based Geofence is Working

Cube

Unable to Explore Cube which was Working Earlier

Cube Displays "Datasource not Ready"

Dashboard

Visualizations Appearing Earlier are No Longer Available in Dashboard

Dashboard Layout Reset after You Resized/moved the Visualizations

Streaming Visualizations Do not Show Any Data

Troubleshoot Live Output

Ensure that CQL Queries for Each Query Stage Emit Output

Ensure that the Output of Stage is Available

Access CQL Engine Metrics

Troubleshoot Pipeline Deployment

Live Table Shows `Listening Events` with No Events in the Table

Live Table Still Shows `Starting Pipeline`