Common Issues and Remedies

This section provides a comprehensive list of items to verify if pipelines are not running as expected.

Pipeline

Common issues encountered while deploying pipelines are listed in this section.

Pipeline

Common issues encountered while deploying pipelines are listed in this section.

Pipelines are not running as expected

If a pipeline is not running as expected, verify the following:

Ensure that Pipeline is Deployed Successfully

To verify Pipeline Deployment on Apache Spark Installation based Spark Cluster:
  1. Open Spark Master console user interface.

  2. If you see the status as Running, then the pipeline is currently deployed and running successfully.

Ensure that the Input Stream is Supplying Continuous Stream of Events to the Pipeline

To check for a continuous supply of events from the input stream:
  1. Go to the Catalog.

  2. Locate and click the stream you want to troubleshoot.

  3. Check the value of the topicName property under the Source Type Parameters section.

  4. Since this topic is created using Kafka APIs, you cannot consume this topic with REST APIs.

    Listen to the Kafka topic hosted on a standard Apache Kafka installation.

    You can listen to the Kafka topic using utilities from a Kafka Installation. kafka-console-consumer.sh is a utility script available as part of any Kafka installation.

    Follow these steps to listen to Kafka topic:

    1. Determine the Zookeeper Address from Apache Kafka Installation based Cluster.

    2. Use the following command to listen the Kafka topic:
      ./kafka-console-consumer.sh --zookeeper IPAddress:2181 --topicName

Ensure that the Output Stream is available in the Monitor Topic

To check if the output stream is available in monitor topic:
  1. Navigate to Catalog.

  2. Open the required pipeline.

  3. Ensure that you stay in pipeline editor and do not click Done. Otherwise the pipeline gets undeployed.

  4. Right-click anywhere in the browser and select Inspect.

  5. Go to WS tab under the Network tab.

  6. Refresh the browser.

    New websocket connections are created.

  7. Locate a websocket whose URL has a parameter with the name topic.

    The value of topic param is the name of Kafka Topic where the output of this stage (query or pattern) is pushed.

    Description of ws_network.png follows
    Description of the illustration ws_network.png

    The topic name is AppName_StageId. The pipeline name can be derived from topic name by removing the _StageID from topic name. In the above snapshot, the pipeline name is sx_2_49_12_pipe1_draft.

Ensure that Caching is Working if a Pipeline is Correlating a Stream with Reference
  1. Go to Spark application master UI.

  2. Open Pipeline Summary Page after clicking on Pipeline Tab. The pipeline summary page shows a table of stages with various metrics.

  3. Click on the stage id corresponding to the Query stage of the pipeline in which you're correlating stream with reference.

  4. In the Pipeline Stage Details page, click the CQLDStream stage to open CQL Engine Summary page.

  5. In the CQL Engine Summary page, locate the External Sources section. Note the source id for the reference which is used in stage.

  6. Open the CQL Engine Detailed Query Analysis page.

  7. Click the operator that has the operator id same as source id to open CQL Engine Detailed Operator Analysis page. Click the entry corresponding to the operator.

  8. Look for the Cache Statistics section. If there is no such section, then caching is not enabled for the stage. If you see the non-zero entries in Cache Hits and Cache Misses, then caching is enabled and active for the pipeline stage.

GGSA Pipeline getting Terminated

A GGSA pipeline can terminate if the targets and references used the pipeline are unreachable or resources are unavailable. You can could see following exceptions on the logs:

com.tangosol.net.messaging.ConnectionException

SQLException in the Spark logs.

In case of a kafka source, republish the pipeline to read records from where it left off before terminating.

Live Table Shows Listening Events with No Events in the Table

There can be multiple reasons why status of pipeline has not changed to Listening Events from Starting Pipeline. Following are the steps to troubleshoot this scenario:

  1. The live table shows output events of only the currently selected stage. At any time, only one stage is selected. Try switching to a different stage. If you observe output in live table for another stage, the problem can be associated with the stage. To debug further, go to step 5.

    If there is no output in any stage, then move to step 2.

  2. Ensure that the pipeline is still running on Spark Cluster. See Ensure that the Pipeline is Deployed Successfully

  3. If the Spark application for your pipeline is killed or aborted, then it suggests that the pipeline has crashed. To troubleshoot further, you may need to look into application logs.

  4. If application is in ACCEPTED, NEW or SUBMITTED state, then application is waiting for cluster resource and not yet started. If there are not enough resources, check the number of VCORES in Big Data Cloud Service Spark Yarn cluster. For a pipeline, Stream Analytics requires minimum 3 VCORES.

  5. If application is in RUNNING state, use the following steps to troubleshoot further:

    1. Ensure that the input stream is pushing events continuously to the pipeline.

    2. If the input stream is pushing events, ensure that each of the pipeline stages is processing events and providing outputs.

    3. If both of the above steps are verified successfully, then ensure that the pipeline is able to push the output events of each stage to its corresponding monitor topic:

      1. Determine the monitor topic for the stage, where the output of stage is being pushed into. See Determine the Topic Name where Output of Pipeline Stage is Propagated .

      2. Listen to the monitor topic and ensure that the events are continuously being pushed in topic. To listen to the Kafka topic, you must have access to Kafka cluster where topic is created. You can listen to the Kafka topic using utilities from a Kafka Installation. kafka-console-consumer.sh is a utility script available as part of any Kafka installation.

      3. If you don't see any events in the topic, then this can be an issue related to writing output events from stage to monitor topic. Check the server logs and look for any exception and then report to the administrator.

      4. If you can see outputs events in monitor topic, then the issue can be related to reading output events in web browser.

Determine the Topic Name where Output of Pipeline Stage is Propagated

Here are the steps to find the topic name for a stage:

  1. Open the Pipeline Summary Page for your pipeline. If you don't know the corresponding application name for this pipeline, see Ensure that the Output Stream is available in the Monitor Topic for instructions.

  2. This page will provide the Pipeline Name and various stage ids. For every pipeline stage, you will see an entry in the table.

  3. For every stage, the output topic id will be PipelineName_StageID.

  4. Click Done in the pipeline editor and then go back to Catalog and open the pipeline again.

Live Table Still Shows Starting Pipeline

There can be multiple reasons why status of pipeline has not changed to Listening Events from Starting Pipeline. Following are the steps to troubleshoot this scenario:

  1. Ensure that the pipeline has been successfully deployed to Spark Cluster. For more information, see Ensure that Pipeline is Deployed Successfully . Also ensure that the Spark cluster is not down and is available.

  2. If the deployment failed, check the Jetty logs to see the exceptions related to the deployment failure and fix the issues.

  3. If the deployment is successful, verify that OSA webtier has received the pipeline deployment from Spark.

  4. Click Done in the pipeline editor and then go back to Catalog and open the pipeline again.

Ensure that a Pipeline Stage is Still Processing Events

To verify if a particular pipeline stage is still processing events:
  1. Go to Spark application master UI.

  2. Open Pipeline Summary Page after clicking on Pipeline Tab. The pipeline summary page shows a table of stages with various metrics.

  3. Check if Total Output Events is a non-zero value. Refresh the page to see if the value increases or stays the same. If the value remains same and doesn't change for a long time, then drill down into stage details.

Time-out Exception in the Spark Logs when you Unpublish a Pipeline

In the Jetty log look for the following message:

OsaSparkMessageQueue:182 - received: oracle.wlevs.strex.spark.client.spi.messaging.AcknowledgeMessage Undeployment Ack: oracle.wlevs.strex.spark.client.spi.messaging.AcknowledgeMessage

During an application shutdown, a pipeline may take several minutes to unpublish completely.

So, if you do not see the above message, then you may need to increase the osa.spark.undeploy.timeout value accordingly.

Also in the High Availability mode, at the time of unpublishing a pipeline, the snapshot folder is deleted.

In HA mode, if you do not receive the above error message on time, and see the following error:

Undeployment couldn't be complete within 60000 the snapshot folder may not be completely cleaned.

it only means that the processing will not be impacted, but some disk space will be occupied.

Piling up of Queued Batches in HA mode

If a GGSA pipeline is deployed in the High Availability mode in a Spark Standalone cluster, every time there is a target or reference failure, Spark spins off new drivers. In case these targets and references are not recoverable, it results in a loop of queued up batches.
To resolve this issue, you have to unpublish the application manually.

Null Record from Summary in Query Stage

When you publish a pipeline for the first time, with a summary in one of the Query stages, the first record is null on all columns. This causes the pipeline to fail, if it has targets where key is necessary.

To solve this issue, check if there is a summary stage added before a target stage, and add a query stage with a filter checking for null values for the cache keys.

Stream

Common issues encountered with streams are listed in this section.

Cannot See Any Kafka Topic or a Specific Topic in the List of Topics

Use the following steps to troubleshoot:

  1. Go to Catalog and select the Kafka connection which you are using to create the stream.

  2. Click Next to go to Connection Details tab.

  3. Click Test Connection to verify that the connection is still active.

  4. Ensure that topic or topics exist in Kafka cluster. You must have access to Kafka cluster where topic is created. You can list all the Kafka topics using utilities from a Kafka Installation. kafka-console-consumer.sh is a utility script available as part of any Kafka installation.

  5. If you can't see any topic using above command, ensure that you create the topic.

  6. If the test connection failed, and you see error message like OSA-01266 Failed to connect to the ZooKeeper server, then the Kafka cluster is not reachable. Ensure that Kafka cluster is up and running.

Input Kafka Topic is Sending Data but No Events Seen in Live Table

This can happen if the incoming events are not adhered to the expected shape for the Stream. To check if the events are dropped due to shape mismatch, use the following steps to troubleshoot:

  1. Verify if lenient parameter under Source Type Properties for the Stream is selected. If it is FALSE, then the event may have been dropped due to shape mismatch. To confirm this, check the application logs for the running application.

  2. If the property is set to TRUE, debug further:

    1. Make sure that Kafka Cluster is up and running. Spark cluster should be able to access Kafka cluster.

    2. If Kafka cluster is up and running, obtain the application logs for further debugging.

Connection

Common issues encountered with connections are listed in this section.

Database Connection Failure

To test a Database connection:

  1. From the Catalog page, select the database connection that you want to test.

  2. Click Next.

  3. On the Connection Details tab, click Test Connection.
    • If the test is successful, it indicates that the connection is active.
    • If the test fails, the following error messages are displayed:

      • OSA-01260 Failed to connect to the database. IO Error: The Network Adapter could not establish the connection: This error message indicates that the DB host is not reachable from GGSA design time.

      • OSA-01260 Failed to connect to the database. ORA-01017: invalid username/password; logon denied: This error message indicates that your login credentials are incorrect.

Druid Connection Failure

To test a Druid connection:

  1. From the Catalog page, select the Druid connection that you want to test.

  2. Click Next.

  3. On the Connection Details tab, click Test Connection.
    • If the test is successful, it indicates that the connection is active.
    • If the test fails, the following error message is displayed:

      OSA-01460 Failed to connect to the druid services at zooKeeper server: This error indicates that the druid zookeeper is not reachable. Ensure that the druid services and zookeeper cluster are up and running.

Coherence Connection Failure

GGSA does not provide a Test connection option for a coherence cluster. Refer to Oracle Coherence documentation to find utilities and tools to test a coherence connection.

JNDI Connection Failure

To test a JNDI connection:

  1. From the Catalog page, select the JNDI connection that you want to test.

  2. Click Next.

  3. On the Connection Details tab, click Test Connection.
    • If the test is successful, it indicates that the connection is active.
    • If the test fails, the following error messages are displayed:

      • OSA-01707 Communication with server failed. Ensure that server is up and server url(s) are specified correctly: This error indicates that either the server is down or server url(s) is incorrectly specified. Server url should be of the format host1:port1,host2:port2.

      • OSA-01706 JNDI connection failed. User: weblogic, failed to be authenticated: This error indicates that the login credentials are incorrect.

Target

Common issues encountered with targets are listed in this section.

Cannot see any Events in Targets

If the pipeline is in the draft mode, it cannot push events to targets. Only published pipelines can push events to targets.

Geofence

Common issues encountered with geofences are listed in this section.

Name and Description Fields are not displayed for the DB-based Geofences

If name and description fields are not displayed for database-based geofence, ensure to follow steps mentioned below:

  1. Go to Catalog and click Edit for the required database-based geofence.

  2. Click Edit for Source Type Properties and then Next.

  3. Ensure that the mapping for Name and Description is defined in Shape section.

  4. Once these mappings are defined, you can see the name and description for the geofence.

DB-based Geofence is not Working

To ensure that a database-based geofence is working:

  1. Go to Catalog and open the required database-based geofence.

  2. Ensure that the connection used in geofence is active by clicking test button in database connection wizard.

  3. Ensure that table used in geofence is still valid and exists in DB.

  4. Go to the geofence page and verify that the issue is resolved.

Cube

Common issues encountered with cubes are listed in this section.

Unable to Explore Cube which was Working Earlier

If you are unable to explore a cube which was working earlier, follow the steps mentioned below:

  1. Check if the druid zookeeper or the associate services for indexer, broker, middle manager or overlord is down.

  2. Click the Druid connection and navigate to next page.

  3. Test the connection. This step will tell you if the services are down and need to be looked into.

Cube Displays "Datasource not Ready"

If you keep seeing “Datasource not Ready” message when you explore a cube, follow the steps mentioned below:

  1. Go to the druid indexer logs. Generally, it is http:/DRUID_HOST:3090/console.html.

  2. Look for entry in running tasks index_kafka_<cube-name>_<somehash>. If there is no entry in running tasks, look for the same entry in pending tasks or completed tasks.

  3. If the entry lies in pending tasks, it means that workers are running out of capacity and datasource will get picked for indexing as soon as it's available.

  4. In such cases, either wait OR increase the worker capacity and restart druid services OR kill some existing datasource indexing tasks (they will get started again after sometime).

  5. If the entry lies in completed tasks with "FAILED" status, it means that indexing failed either due to incorrect ingestion spec or due to resource issue.

  6. You can find the exact reason by clicking "log (all)" link and navigating to the exception.

  7. If it is due to ingestion, try changing the timestamp format. (Druid fails to index, if the timestamp is not in JODA timeformat OR if the timeformat specified does not match with format of timestamp value).

Dashboard

Common issues encountered with dashboards are listed in this section.

Visualizations Appearing Earlier are No Longer Available in Dashboard

Use the following steps to troubleshoot:

  1. For missing streaming visualization, it might be due to the following reasons:

    1. Corresponding pipeline/stage for the missing visualizations no longer exists

    2. The visualization itself is removed from catalog or the pipeline editor

  2. For missing exploration visualization (created from cube), it might happen as cube or visualization might have been deleted already.

Dashboard Layout Reset after You Resized/moved the Visualizations

Use the following steps to troubleshoot:

  1. This might happen, if you forget to save the dashboard after movement/ resizing of visualizations.

  2. Make sure to click Save after changing the layout.

Streaming Visualizations Do not Show Any Data

Use the following steps to troubleshoot:

  1. Go to visualization in pipeline editor and make sure that live output table is displaying data.

  2. If there is no output, ensure that the pipeline is deployed and running on cluster. Once you have the live output table displaying data, it shows up on the streaming visualization.

Live Output

Common issues encountered with live output are listed in this section.

Issues with Live Output

For every pipeline, there will be one Spark streaming pipeline running on Spark Cluster. If a Stream Analytics pipeline uses one or more Query Stage or Pattern Stage, then the pipeline will run one or more continuous query for each of these stages.

For more information about continuous query, see Understanding Oracle CQL.

Ensure that CQL Queries for Each Query Stage Emit Output

To check if the CQL queries are emitting output events to monitor CQL Queries using CQL Engine Metrics:

  1. Open CQL Engine Query Details page. For more information, see Access CQL Engine Metrics.

  2. Check that at least one partition has Total Output Events greater than zero under the Execution Statistics section.

    Description of cql_engine_query_details.png follows
    Description of the illustration cql_engine_query_details.png

    If your query is running without any error and input data is continuously coming, then the Total Output Events will keep rising.

Ensure that the Output of Stage is Available

  1. Ensure that you stay in the Pipeline Editor and do not click Done. Else, the pipeline gets undeployed.

  2. Right-click anywhere in the browser and click Inspect.

  3. Select Network from the top tab and then select WS.

  4. Refresh the browser.

    New websocket connections are created.

  5. Locate a websocket whose URL has a parameter with name topic.

    The value of the topic param is the name of the Kafka topic where the output of this stage is pushed.

    Description of websocket_network.png follows
    Description of the illustration websocket_network.png

  6. Listen to the Kafka topic where output of the stage is being pushed.

    Since this topic is created using Kafka APIs, you cannot consume this topic with REST APIs. Follow these steps to listen to the Kafka topic:

    1. Listen to the Kafka topic hosted on a standard Apache Kafka installation.

      You can listen to the Kafka topic using utilities from a Kafka Installation. kafka-console-consumer.sh is a utility script available as part of any Kafka installation.

      To listen to Kafka topic:

      1. Determine the Zookeeper Address from Apache Kafka Installation based Cluster.

      2. Use following command to listen the Kafka topic:
        ./kafka-console-consumer.sh --zookeeper IPAddress:2181 --topic sx_2_49_12_pipe1_draft_st60

Missing Events due to Faulty Data

If the CQLEngine encounters faulty data in user defined functions, the exceptions of the events with faulty data are logged in executor logs, and the processing continues uninterrupted.

Sample logs of the dropped events and exceptions:

20/04/02 14:41:42 ERROR spark: Fault in CQL query processing. 
Detailed Fault Information [Exception=user defined function(oracle.cep.extensibility.functions.builtin.math.Sqrt@74a5e306) 
runtime error while execution,
 Service-Name=SparkCQLProcessor, Context=phyOpt:1; queries:sx_SquareRootPipeline_osaadmin_draft1
20/04/02 14:41:42 ERROR spark: Continue on exception by dropping faulty event.
20/04/02 14:41:42 ERROR spark: Dropped event details <TupleValue><ObjectName>sx_SquareRootPipeline_osaadmin_draft_1</ObjectName><Timestamp>1585838502000000000</Timestamp>
<TupleKind>PLUS</TupleKind><IntAttribute name="squareNumber"><Value>-2</Value></IntAttribute><IsTotalOrderGuarantee>true</IsTotalOrderGuarantee></TupleValue>:

Pipeline Deployment Failure

Sometimes pipeline deployment fails with the following exception:

Spark pipeline did not start successfully after 60000 ms.

This exception usually occurs when you do not have free resources on your cluster.

Workaround:

Use external Spark cluster or get better machine and configure the cluster with more resources.