The Snowpark library provides an intuitive API for querying and processing data in a data pipeline. Using this library, you can build applications that process data in Snowflake without moving data to the system where your application code runs. Snowpark has several features that distinguish it from other client libraries:

  • The Snowpark API provides programming language constructs for building SQL statements. For example, the API provides a select method that you can use to specify the column names to return, rather than writing 'select column_name' as a string.

    Although you can still use a string to specify the SQL statement to execute, you benefit from features like intelligent code completion and type checking when you use the native language constructs provided by Snowpark.

  • Snowpark operations are executed lazily on the server, which reduces the amount of data transferred between your client and the Snowflake database.

    The core abstraction in Snowpark is the DataFrame, which represents a set of data and provides methods to operate on that data. In your client code, you construct a DataFrame object and set it up to retrieve the data that you want to use (for example, the columns containing the data, the filter to apply to rows, etc.).

    The data isn’t retrieved at the time when you construct the DataFrame object. Instead, when you are ready to retrieve the data, you can perform an action that evaluates the DataFrame objects and sends the corresponding SQL statements to the Snowflake database for execution.

  • You can create user-defined functions (UDFs) in your code, and Snowpark can push your code to the server, where the code can operate on the data.

    You can write functions in the same language that you use to write your client code (for example, by using anonymous functions in Scala or by using lambda functions in Python). To use these functions to process data in the Snowflake database, you define and call user-defined functions (UDFs) in your custom code.

    Snowpark automatically pushes the custom code for UDFs to the Snowflake database. When you call the UDF in your client code, your custom code is executed on the server (where the data is). You don’t need to transfer the data to your client in order to execute the function on the data.

In comparison to the Snowflake Connector for Spark, Snowpark provides the following benefits:

  • Snowpark supports pushdown for all operations, including Snowflake UDFs.

  • Snowpark does not require a separate cluster outside of Snowflake for computations. All of the computations are done within Snowflake.

Next Topics:

Back to top