Tutorial: Get started with Snowpark Connect for Spark¶
This tutorial walks you through a complete Snowpark Connect for Spark workflow using a local IDE. You’ll create a
source table, read data into a Spark DataFrame, apply transformations with user-defined functions,
save results to a table and a file, and verify the output using the SnowflakeSession class.
Choose the tab for your language in each step to follow along in Python, Java, or Scala.
Note
Each step builds on the previous one to form a single end-to-end example. Pick one language tab
and follow it through all steps. For Python, you can run each step individually in a REPL or
notebook. For Java or Scala, combine all steps into one Tutorial.java or Tutorial.scala
file before running.
Prerequisites¶
Complete the environment setup for your language before starting this tutorial. You need a Snowflake account with access to Snowpark Connect for Spark and a database and schema to work in.
Step 1: Connect and create a source table¶
Start a session and create a table with sample sales data.
Note
The Java client for Snowpark Connect for Spark is a preview feature.
Note
The Scala client for Snowpark Connect for Spark is a preview feature.
Step 2: Read from the table and apply transformations¶
Read the data into a DataFrame, normalize the region column to uppercase, and apply a tiered
tax rate based on the sale amount. Orders under $50 are taxed at 5%, orders between $50 and $150
at 10%, and orders over $150 at 15%. This kind of multi-bracket business logic is a natural fit for
a UDF. The Python example registers a UDF, while Java and Scala use typed Dataset map operations.
Note
The Java client for Snowpark Connect for Spark is a preview feature.
Note
The Scala client for Snowpark Connect for Spark is a preview feature.
The output looks like this:
Step 3: Save results to a table and a file¶
Write the transformed data to a new Snowflake table and export it to a Parquet file on an internal stage.
Note
The Java client for Snowpark Connect for Spark is a preview feature.
Note
The Scala client for Snowpark Connect for Spark is a preview feature.
Step 4: Verify with the SnowflakeSession class¶
Use SnowflakeSession to run a Snowflake-native aggregation query against the output table and
list the staged Parquet files written in Step 3. This verifies that both outputs were created and
demonstrates how to use SnowflakeSession for Snowflake-specific SQL.
Note
The Java client for Snowpark Connect for Spark is a preview feature.
Note
The Scala client for Snowpark Connect for Spark is a preview feature.
The staged file listing shows the Parquet files written to the stage:
The table aggregation output looks like this:
Step 5: Run the tutorial¶
Note
The Java client for Snowpark Connect for Spark is a preview feature.
Note
The Scala client for Snowpark Connect for Spark is a preview feature.
Clean up¶
Remove the tables and stage files created during this tutorial:
Next steps¶
- Learn more about executing Snowflake SQL,
including
SnowflakeSessionfor Snowflake-specific syntax. - Explore user-defined functions for Python, Java, and Scala UDFs and UDTFs.
- Read and write files with File I/O.
- Configure session behavior with parameters.