Run Spark workloads from Snowflake Notebooks¶
You can run Spark workloads interactively from Snowflake Notebooks without needing to manage a Spark cluster. The workloads run on the Snowflake infrastructure.
To use Snowflake Notebooks as a client for developing Spark workloads to run on Snowflake:
Launch Snowflake Notebooks.
Within the notebook, start a Spark session.
Write PySpark code to load, transform, and analyze data—such as to filter high-value customer orders or aggregate revenue.
Use a Snowflake Notebook that runs on a warehouse¶
For more information about Snowflake Notebooks, see Create a notebook.
Create a Snowflake Notebook by completing the following steps:
Sign in to Snowsight.
From the navigation menu, select + Create » Notebook » New Notebook.
In the Create notebook dialog, enter a name, database, and schema for the new notebook.
For more information, see Create a notebook.
For Runtime, select Run on warehouse.
For Runtime version, select Snowflake Warehouse Runtime 2.0.
When you select version 2.0, you ensure that you have the dependency support you need, including Python 3.10. For more information, see Notebook runtimes.
For Query warehouse and Notebook warehouse, select warehouses for running query code and kernel and Python code, as described in Create a notebook.
Select Create.
In the notebook you created, under Packages, ensure that you have the following packages listed to support code in your notebook:
Python, version 3.10 or later
snowflake-dataframe-processor, latest version
If you need to add these packages, use the following steps:
Under Anaconda Packages, type the packages name in the search box.
Select the package name.
Select Save.
To connect to the Snowpark Connect for Spark server and test the connection, copy the following code and paste it in the Python cell of the notebook you created:
from snowflake import snowpark_connect spark = snowpark_connect.server.init_spark_session() df = spark.sql("show schemas").limit(10) df.show()