Changes to the Snowpark API¶
This topic summarizes the changes made to the Snowpark API.
In this Topic:
Version 0.6.0 of the Snowpark library introduces some API changes, new features, improvements, and fixes to bugs.
In this release, the following methods in
RelationalGroupedDataFrame now require an argument:
In previous releases, if you called these methods without an argument, these methods were applied to all numeric columns in the
DataFrame. For example, for a DataFrame
df with the columns
(a int, b string, c float), calling
df.groupBy("a").max() was equivalent to calling
With this release, calling these methods without an argument results in a
Version 0.5.0 of the Snowpark library introduces some new features, improvements, and fixes to bugs.
maxWidthparameter to the DataFrame.show() method. You can use this parameter to adjust the number of characters printed in the output for each column.
Added the Session.cancelAll() method, which you can use to cancel all running actions on this session.
Added the DataFrame.toLocalIterator() method, which returns an iterator that you can use to retrieve data, row by row. You can use this rather than
DataFrame.collect(), if you don’t want to load all of the data into memory at once.
Added the median method to the
Improved the error message returned when an identifier is invalid.
Enhanced the error checking to report an error when no database or schema name is specified.
Added a performance improvement when inserting a large number of values in a table.
Updated the library to consistently handle Snowflake object identifiers (table and view names). Now, all parameters that specify table or view names support the use of:
Short names (e.g.
Fully-qualified names (e.g.
Multi-part identifiers (e.g.
Seq("database", "schema", "view_name"))
Added a check to verify that the supported version of Scala is being used. The library will report error if the Scala version is not compatible.
Fixed a problem with registering UDFs on Microsoft Windows.
Fixed a problem with the order of results when using
Session.range()to generate a sequence of numbers without gaps.
In this version, you no longer need to specify a temporary schema or temporary database for Snowpark objects (the
TEMP_DB settings). The Snowpark library automatically creates temporary versions of the objects
In addition, this version introduces the API changes listed in the next sections.
DataFrame.cache()method with the DataFrame.cacheResult() method.
The new method creates and returns a new DataFrame with the cached results and has no effect on the current DataFrame. As a result of this change, the DataFrame object is now immutable.
Added the following new methods to the RelationalGroupedDataFrame class.
Added the following new methods to the DataFrame class:
Added the following new functions to the functions object:
Added a Session.file object, which provides the following new methods for performing file operations:
Made the following changes to the Session.createDataFrame method:
Added support for user-provided schemas.
Added support for specifying an array/map of variant/geography data.
Added support for Geography/Variant data types in UDFs.
Fixed a problem when the DataFrame column name contains quotation marks.
Fixed a problem with the inability to escape data that contains backslashes, single quotes, and newline characters.
Fixed a problem where UDF creation fails with the error message “code too large”.
Fixed a problem where the UDF closure failed to capture the value of a local string variable.
Added the result schema for the following SQL clauses
Fixed a problem when using Snowpark in Visual Studio Code with the Metals extension to create a UDF.