Before you Begin with Data Flow

Before you begin using Data Flow, you must have:

  • An Oracle Cloud Infrastructure account. Trial accounts can be used to demo Data Flow.
  • A Service Administrator role for your Oracle Cloud services. When the service is activated, Oracle sends the credentials and URL to the chosen Account Administrator. The Account Administrator creates an account for each user who needs access to the service.
  • A supported browser, such as:
    • Microsoft Internet Explorer 11.x+

    • Mozilla Firefox ESR 38+

    • Google Chrome 42+

  • A Spark Application  uploaded to Object Storage. Do not provide it packaged in a zipped format such as .zip or .gzip.
  • Data for processing loaded into Oracle Cloud Infrastructure Object Storage. Data can be read from external data sources or clouds. Data Flow optimizes performance and security for data stored in an Oracle Cloud Infrastructure Object Store.
  • The supported application types are:
    • Java
    • Scala
    • SparkSQL
    • PySpark (Python 3 only)
  • This table shows the Spark versions supported by Data Flow.
    Supported Spark Versions
    Spark Version Hadoop Java Python Scala oci-hdfs oci-java-sdk Spark Documentation
    Spark 3.5.0 3.3.4 17.0.10 3.11.5 2.12.18 3.3.4.1.4.2 3.34.1 Spark Release 3.5.0 Guide
    Spark 3.2.1 3.3.1 11.0.14 3.8.13 2.12.15 3.3.1.0.3.2 2.45.0 Spark Release 3.2.1 Guide
    Spark 3.0.2 3.2.0 1.8.0_321 3.6.8 2.12.10 3.2.1.3 1.25.2 Spark Release 3.0.2 Guide
    Spark 2.4.4 2.9.2 1.8.0_162 3.6.8 2.11.12 2.9.2.6 1.25.0 Spark Release 2.4.4 Guide
    This table is for reference only, and isn't meant to be comprehensive.
Note

Avoid entering confidential information when assigning descriptions, tags, or friendly names to your cloud resources through the Oracle Cloud Infrastructure Console, API, or CLI. This applies when creating or editing an application in Data Flow.