Configuring Spark Structured Streaming using Workflows

You can configure a streaming task inside a workflow for continuous processing of stream data.

You first need to create a job and then add one Notebook or Python task to that job to begin using workflows with streaming in Oracle AI Data Platform.
  1. Navigate to your workspace and click Workflow.
  2. Click Create cluster iconCreate Job.
  3. Provide a name and description for your job.
  4. Click Browse and select the location to save the job in your AI Data Platform. Click Select.
  5. Enter 1 for Max Concurrent Runs.
  6. Click Create.
  7. Click the job you just created.
  8. Click Add task.
  9. Provide a name for your task.
  10. Select Notebook or Python for Task type.
  11. Click Browse and navigate to the Notebook or Python script you want to add as a Streaming task. Click Select.
  12. Select a compute cluster for the Notebook or Python task, if one is not already attached.
  13. Select the Streaming checkbox. Selecting Streaming disables execution timeout and task dependencies as options.

    Create Task details page open with the Streaming checkbox selected

  14. Select the number of retries a task should attempt on failure. If you select more than 0, you must also specify how much time the job run should wait between retries and if retries should be attempted on timeout.

    Task retry options when the number of retries is 1 or greater

  15. Click Run Now.
After a Streaming task is started, it continues to run until you manually stop it. During regular monthly maintenance, the Streaming task is stopped and restarted by the service without requiring any action from your end.