Run Spark workloads from Snowflake Notebooks¶
You can run Spark workloads interactively from Snowflake Notebooks without needing to manage a Spark cluster. The workloads run on the Snowflake infrastructure.
To use Snowflake Notebooks as a client for developing Spark workloads to run on Snowflake:
Launch Snowflake Notebooks.
Within the notebook, start a Spark session.
Write PySpark code to load, transform, and analyze data—such as to filter high-value customer orders or aggregate revenue.
Use a Snowflake Notebook that runs on a warehouse¶
For more information about Snowflake Notebooks, see Create a notebook.
Create a Snowflake Notebook by completing the following steps:
Sign in to Snowsight.
At the top of the navigation menu, select
(Create) » Notebook » New Notebook.In the Create notebook dialog, enter a name, database, and schema for the new notebook.
For more information, see Create a notebook.
For Runtime, select Run on warehouse.
For Runtime version, select Snowflake Warehouse Runtime 2.0.
When you select version 2.0, you ensure that you have the dependency support you need, including Python 3.10. For more information, see Notebook runtimes.
For Query warehouse and Notebook warehouse, select warehouses for running query code and kernel and Python code, as described in Create a notebook.
Select Create.
In the notebook you created, under Packages, ensure that you have the following packages listed to support code in your notebook:
Python, version 3.10 or later
snowpark-connect, latest version
If you need to add these packages, use the following steps:
Under Anaconda Packages, type the packages name in the search box.
Select the package name.
Select Save.
To connect to the Snowpark Connect for Spark server and test the connection, copy the following code and paste it in the Python cell of the notebook you created:
Use a Snowflake Notebook that runs in a workspace¶
For more information about Snowflake Notebooks in Workspaces, see Snowflake Notebooks in Workspaces.
Create a PyPI external access integration.
You must use the ACCOUNTADMIN role and have a database you can access.
Run the following commands from a SQL file in a workspace.
Enable PyPI integration in a notebook.
In the notebook, for Service name, select a service.
For External access integrations, select the PyPI integration you created.
For Python version, select Python 3.11.
Select Create.
Install the
snowpark_connectpackage from PyPI in the notebook, using code such as the following:Restart the kernel.
From the Connect button, select Restart kernel.
Start the
snowpark_connectserver using code such as the following:Run your Spark code, as shown in the following example:
