Samples: File transformation with Snowpark Connect for Spark¶
This page shows how to use a Python UDF to transform files stored on a Snowflake stage. The UDF
reads a file from the stage, applies a transformation, and writes the result to a new file. You
then copy the converted files back to the stage using COPY FILES.
This pattern is useful for workloads such as:
- Converting between file formats.
- Resizing images.
- Transforming files into a “golden state” in a timestamped folder.
The example assumes you’ve completed the
local IDE setup and have a
~/.snowflake/connections.toml entry configured.
Example: Transform a staged file with a UDF¶
This example defines a UDF that reads a text file from a stage, appends foo to the end of each
line, and writes the result to a new file. The converted file is then copied back to the stage
using SnowflakeSession to run a COPY FILES command.
The UDF uses SnowflakeFile
from the snowflake-snowpark-python package, which provides read and write access to files on
Snowflake stages from within UDF handlers.
Important
You must mark the UDF as nondeterministic with .asNondeterministic(). Calling
SnowflakeFile.open_new_result() requires the function to be mutable (volatile). Snowflake
only allows mutable file operations inside nondeterministic UDFs.
How it works¶
-
Configure packages: The
snowpark.connect.udf.packagessetting makes thesnowflake-snowpark-pythonpackage available inside the UDF execution environment on Snowflake. This provides access to theSnowflakeFileclass. -
Define the UDF: The
convert_filefunction opens the input file usingSnowflakeFile.open()and creates a new output file withSnowflakeFile.open_new_result(). After processing, the function returns the result file, which produces a scoped URL pointing to the converted file. -
Mark as nondeterministic:
SnowflakeFile.open_new_result()requires the function to be mutable (volatile). Snowflake only allows mutable file operations inside nondeterministic UDFs, so you must call.asNondeterministic()on the UDF. -
Run the UDF: A DataFrame containing the stage file path is passed through the UDF. The result contains the scoped URL of the converted file alongside the original filename.
-
Copy results to stage:
SnowflakeSessionprovides access to the underlying Snowflake session so you can run SQL commands. TheCOPY FILESstatement copies the converted files from their temporary location to your target stage.
Note
The scoped URL returned by the UDF is valid for 24 hours. Run the COPY FILES statement
within that window to persist the converted files.
Related topics¶
- File transformation with Snowpark Python UDFs for the equivalent pure Snowpark Python approach.
SnowflakeFileclass reference for details on reading and writing files from UDF handlers.- User-defined functions with Snowpark Connect for Spark for UDF configuration options, package management, and best practices.
- File I/O with Snowpark Connect for Spark for reading and writing files on stages.