PySpark

PySpark 3.0 and Data Flow CPU on Python 3.7 (version 5.0)

A description of the PySpark 3.0 and Data Flow CPU on Python 3.7 (version 5.0) conda environment.

Released

June 24, 2022

Description

Apply the power of Apache Spark and MLlib to speed up your model building. Use PySparkSQL to analyze structured and semi-structured data that is stored in Object Storage. These files can be accessed using Resource Principals for easy and secure authentication. PySpark applies the full power of a notebook session by using parallel computing. For larger jobs, you can develop Spark applications and then submit them to the Data Flow service.

To get started with this conda environment, review the getting-started.ipynb notebook example, Using the Notebook Explorer to access Notebook Examples.

Python Version

3.7

Slug

pyspark30_p37_cpu_v5

Object Storage Path
oci://service-conda-packs@id19sfcrra6z/service_pack/cpu/PySpark_3.0_and_Data_Flow/5.0/pyspark30_p37_cpu_v5

Top Libraries

  • oracle-ads (v2.6.1)

  • oraclejdk (v8)

  • pyspark (v3.0.2)

  • sparksql-magic (v0.0.3)

  • sparkmagic (v0.20.0)

For a complete list of preinstalled Python libraries, see pyspark30_p37_cpu_v5.txt.

Example Notebooks

  • getting-started.ipynb

PySpark 3.0 and Data Flow CPU on Python 3.7 (version 4.0)

A description of the PySpark 3.0 and Data Flow CPU on Python 3.7 (version 4.0) conda environment.

Released

March 29, 2022

Description

Apply the power of Apache Spark and MLlib to speed up your model building. Use PySparkSQL to analyze structured and semi-structured data that is stored in Object Storage. These files can be accessed using Resource Principals for easy and secure authentication. PySpark applies the full power of a notebook session by using parallel computing. For larger jobs, you can develop Spark applications and then submit them to the Data Flow service.

To get started with this conda environment, review the getting-started.ipynb notebook example, Using the Notebook Explorer to access Notebook Examples.

Python Version

3.7

Slug

pyspark30_p37_cpu_v4

Object Storage Path
oci://service-conda-packs@id19sfcrra6z/service_pack/cpu/PySpark 3.0 and Data Flow/4.0/pyspark30_p37_cpu_v4

Top Libraries

  • oracle-ads (v2.5.8)

  • oraclejdk (v8)

  • pyspark (v3.0.2)

  • scikit-learn (v1.0.2)

  • sparksql-magic (v0.0.3)

For a complete list of preinstalled Python libraries, see pyspark30_p37_cpu_v4.txt.

Example Notebooks

  • getting-started.ipynb

  • api_keys.ipynb

  • caltech.ipynb

  • data_flow.ipynb

  • model_catalog.ipynb

  • model_deployment.ipynb

  • model_deployment_using_jobs.ipynb

  • project.ipynb

  • pyspark.ipynb

  • pyspark_adb.ipynb

  • pyspark_adb_dtypes.ipynb

  • pyspark_adb_partition.ipynb

  • pyspark_pushdown.ipynb

  • vault.ipynb

  • visual_genome.ipynb

Using the Notebook Explorer to access Notebook Examples describes how to locate and access the included interactive example notebooks, and what each of them can be used for.

PySpark 3.0 and Data Flow CPU on Python 3.7 (version 3.0)
PySpark 3.0 and Data Flow CPU on Python 3.7 (version 2.0) [Removed]
PySpark 3.0 and Data Flow CPU on Python 3.7 (version 1.0) [Removed]
PySpark 2.4 and Data Flow CPU on Python 3.7 (version 3.0)

A description of the PySpark 2.4 and Data Flow CPU on Python 3.7 (version 3.0) conda environment.

Released

March 29, 2022

Description

Apply the power of Apache Spark and MLlib to speed up your model building. Use PySparkSQL to analyze structured and semi-structured data that is stored in Object Storage. These files can be accessed using Resource Principals for easy and secure authentication. PySpark applies the full power of a notebook session by using parallel computing. For larger jobs, you can develop Spark applications and then submit them to the Data Flow service.

To get started with this conda environment, review the getting-started.ipynb notebook example, Using the Notebook Explorer to access Notebook Examples.

Python Version

3.7

Slug

pyspark24_p37_cpu_v3

Object Storage Path
oci://service-conda-packs@id19sfcrra6z/service_pack/cpu/PySpark and Data Flow/3.0/pyspark24_p37_cpu_v3

Top Libraries

  • oracle-ads (v2.5.8)

  • oraclejdk (v8)

  • pyspark (v2.4.4)

  • scikit-learn (v1.0.2)

  • sparksql-magic (v0.0.3)

For a complete list of preinstalled Python libraries, see pyspark24_p37_cpu_v3.txt.

Example Notebooks

  • getting-started.ipynb

  • api_keys.ipynb

  • caltech.ipynb

  • data_flow.ipynb

  • project.ipynb

  • pyspark.ipynb

  • pyspark_adb.ipynb

  • pyspark_adb_dtypes.ipynb

  • pyspark_adb_partition.ipynb

  • vault.ipynb

  • visual_genome.ipynb

Using the Notebook Explorer to access Notebook Examples describes how to locate and access the included interactive example notebooks, and what each of them can be used for.

PySpark 2.4 and Data Flow CPU on Python 3.7 (version 2.0) [Removed]
PySpark 2.4 and Data Flow CPU on Python 3.7 (version 1.0) [Removed]
PySpark (version 1.0)