PySpark

Learn about PySpark conda environments.

PySpark 3.2 and Feature Store (version 3.0)

A description of the PySpark 3.2 and Feature Store on Python 3.8 (version 3.0) conda environment.

Released

February 9, 2024

Description

The Feature Store conda environment includes feature store package which provides a centralized solution for data transformation and access during training and serving, establishing a standardized pipeline for data ingestion and querying and the Data Flow magic commands to manage the lifecycle of a remote Data Flow Session cluster and remotely run spark code snippets in the cluster. This conda provides support for ingesting data in the delta format, making it a first-class citizen within the system. Oracle Data Science feature store offers support for DCAT Hive Metastore, which serves as a registry for schema metadata and lets users register and manage the metadata associated with schemas.

To get started with the Feature store environment, review the getting-started notebook, using the Launcher.

Python Version

3.8

Slug fspyspark32_p38_cpu_v3
Object Storage Path
oci://service-conda-packs@id19sfcrra6z/service_pack/cpu/PySpark_3.2_and_Feature_Store/3.0/fspyspark32_p38_cpu_v3

Top Libraries

  • Data Flow Sparkmagic (1.0.14)
  • oracle-ads(v2.10.0)
  • oraclejdk (v8)
  • pyspark (v3.2.1)
  • sparksql-magic (v0.0.3)
  • oracle-ml-insights (v1.0.4)
  • spark-nlp (v4.2.1)
  • transformers (v4.32.1)
  • langchain (v0.0.267)

For a complete list of preinstalled Python libraries, see fspyspark32_p38_cpu_v3.txt.

PySpark 3.2 and Feature Store (version 2.0)

A description of the PySpark 3.2 and Feature Store on Python 3.8 (version 2.0) conda environment.

Released

December 1, 2023

Description

The Feature Store conda environment includes feature store package which provides a centralized solution for data transformation and access during training and serving, establishing a standardized pipeline for data ingestion and querying and the Data Flow magic commands to manage the lifecycle of a remote Data Flow Session cluster and remotely run spark code snippets in the cluster. This conda provides support for ingesting data in the delta format, making it a first-class citizen within the system. Oracle Data Science feature store offers support for DCAT Hive Metastore, which serves as a registry for schema metadata and lets users to register and manage the metadata associated with schemas.

To get started with the Feature store environment, review the getting-started notebook.

Python Version

3.8

Slug fspyspark32_p38_cpu_v2
Object Storage Path
oci://service-conda-packs@id19sfcrra6z/service_pack/cpu/PySpark_3.2_and_Feature_Store/2.0/fspyspark32_p38_cpu_v2

Top Libraries

  • Data Flow Sparkmagic (1.0.14)

  • oracle-ads(v2.9.0)

  • oraclejdk (v8)pyspark (v3.2.1)

  • sparksql-magic (v0.0.3)

  • spark-nlp (v4.2.1)

  • transformers (v4.32.1)

  • langchain (v0.0.267)

For a complete list of preinstalled Python libraries, see fspyspark32_p38_cpu_v2.txt.

PySpark 3.2 and Big Data (version 2.0)
PySpark 3.2 and Data Flow CPU on Python 3.8 (version 3.0)
PySpark 3.2 and Data Flow CPU on Python 3.8 (version 2.0)