PySpark
Learn about PySpark conda environments.
A description of the PySpark 3.2 and Feature Store on Python 3.8 (version 3.0) conda environment.
Released |
February 9, 2024 |
---|---|
Description |
The Feature Store conda environment includes feature store package which provides a centralized solution for data transformation and access during training and serving, establishing a standardized pipeline for data ingestion and querying and the Data Flow magic commands to manage the lifecycle of a remote Data Flow Session cluster and remotely run spark code snippets in the cluster. This conda provides support for ingesting data in the delta format, making it a first-class citizen within the system. Oracle Data Science feature store offers support for DCAT Hive Metastore, which serves as a registry for schema metadata and lets users register and manage the metadata associated with schemas. To get started with the Feature store environment, review the getting-started notebook, using the Launcher. |
Python Version |
3.8 |
Slug | fspyspark32_p38_cpu_v3 |
Object Storage Path |
|
Top Libraries |
For a complete list of preinstalled Python libraries, see fspyspark32_p38_cpu_v3.txt. |
A description of the PySpark 3.2 and Feature Store on Python 3.8 (version 2.0) conda environment.
Released |
December 1, 2023 |
---|---|
Description |
The Feature Store conda environment includes feature store package which provides a centralized solution for data transformation and access during training and serving, establishing a standardized pipeline for data ingestion and querying and the Data Flow magic commands to manage the lifecycle of a remote Data Flow Session cluster and remotely run spark code snippets in the cluster. This conda provides support for ingesting data in the delta format, making it a first-class citizen within the system. Oracle Data Science feature store offers support for DCAT Hive Metastore, which serves as a registry for schema metadata and lets users to register and manage the metadata associated with schemas. To get started with the Feature store environment, review the getting-started notebook. |
Python Version |
3.8 |
Slug | fspyspark32_p38_cpu_v2 |
Object Storage Path |
|
Top Libraries |
For a complete list of preinstalled Python libraries, see fspyspark32_p38_cpu_v2.txt. |
A description of the PySpark 3.2 and Big Data service (version 2.0) conda environment.
Released |
April 4, 2023 |
---|---|
Description |
Leverage the power of Apache Spark and MLlib to speed up model building. Use PySparkSQL to analyze structured and semi-structured data that are stored on Oracle Object Storage, Big Data Service, and Data Catalog. PySpark leverages the full power of a notebook session by using parallel computing. For larger jobs, develop Spark applications, and then submit them to the Data Flow service. To get started with this conda environment, review the Getting Started notebook, using the Launcher. |
Python Version |
3.8 |
Object Storage Path |
|
Slug |
|
Top Libraries |
For a complete list of preinstalled Python libraries, see bdspyspark32_p38_cpu_v2.txt. |
A description of the PySpark 3.2 and Data Flow CPU on Python 3.8 (version 3.0) conda environment.
Released |
July 10, 2023 |
---|---|
Description |
This conda environment includes the Data Flow magic commands to manage the life cycle of a remote Data Flow Session cluster and remotely run spark code snippets in the cluster. This conda environment allows data scientists to leverage Apache Spark including the machine learning algorithms in MLlib. Use PySparkSQL to analyze structured and semi-structured data stores in Object Storage. PySpark leverages the full power of a notebook session by using parallel computing. Use PySparkSQL to analyze structured and semi-structured data stored in Object Storage Data Flow is also integrated with the Data Catalog Hive Metastore To get started with this conda environment, review the Getting Started notebook, using the Launcher. |
Python Version |
3.8 |
Object Storage Path |
|
Slug |
|
Top Libraries |
For a complete list of preinstalled Python libraries, see pyspark32_p38_cpu_v3.txt. |
A description of the PySpark 3.2 and Data Flow CPU on Python 3.8 (version 2.0) conda environment.
Released |
December 1, 2022 |
---|---|
Description |
This conda environment includes the Data Flow magic commands to manage the life cycle of a remote Data Flow Session cluster and remotely execute spark code snippets in the cluster. This conda environment allows data scientists to leverage Apache Spark including the machine learning algorithms in MLlib. Use PySparkSQL to analyze structured and semi-structured data stores in Object Storage. PySpark leverages the full power of a notebook session by using parallel computing. Use PySparkSQL to analyze structured and semi-structured data stored in Object Storage Data Flow is also integrated with the Data Catalog Hive Metastore To get started with this conda environment, review the Getting Started notebook, using the Launcher. |
Python Version |
3.8 |
Object Storage Path |
|
Slug |
|
Top Libraries |
For a complete list of preinstalled Python libraries, see pyspark32_p38_cpu_v2.txt. |