Changes to the Snowpark API

This topic summarizes the changes made to the Snowpark API.

In this Topic:

Versions After 0.7.0

For the changes made in versions of the library after 0.7.0, see the Client Release History on the Snowflake Community site.

Version 0.7.0

Version 0.7.0 of the Snowpark library introduces some API changes, new features, improvements, and fixes to bugs.

New APIs

  • Introduced the new Session.close() method. Call this method to close the Snowpark session, which cancels all running queries and prevents the subsequent use of this session to execute queries.

  • Introduced the new Updatable class. Updatable extends the DataFrame class and provides additional table-related capabilities (e.g. the ability to update and delete values).

    The Session.table() method now returns an Updatable object, rather than a DataFrame object.

  • Introduced new signatures for the registerTemporary methods in the UDFRegistration class. These signatures do not have a parameter for the name of the UDF, which means that you can use these to register an anonymous temporary UDF.

API Changes

  • As mentioned in New APIs, the Session.table() method now returns an Updatable object, which extends DataFrame.

  • In the Geography class, removed support for formats other than GeoJSON. Now, Geography only supports the GeoJSON data format.

Improvements

  • Improved the DataFrame.cacheResult() method to reduce the possibility of “object already exists” errors.

  • Improved some error messages.

  • Added a new log message that prints out session information after you log in.

Bug Fixes

  • Fixed an issue in which the DataFrame.show() method did not display binary data correctly.

  • Fixed an error that occurred when getting the version number.

Version 0.6.0

Version 0.6.0 of the Snowpark library introduces some API changes, new features, improvements, and fixes to bugs.

API Changes

In this release, the following methods in RelationalGroupedDataFrame now require an argument:

  • avg

  • max

  • median

  • min

  • sum

In previous releases, if you called these methods without an argument, these methods were applied to all numeric columns in the DataFrame. For example, for a DataFrame df with the columns (a int, b string, c float), calling df.groupBy("a").max() was equivalent to calling df.groupBy("a").max(col("a"), col("c")).

With this release, calling these methods without an argument results in a SnowparkClientException.

Version 0.5.0

Version 0.5.0 of the Snowpark library introduces some new features, improvements, and fixes to bugs.

New Features

  • Added a maxWidth parameter to the DataFrame.show() method. You can use this parameter to adjust the number of characters printed in the output for each column.

  • Added the Session.cancelAll() method, which you can use to cancel all running actions on this session.

  • Added the DataFrame.toLocalIterator() method, which returns an iterator that you can use to retrieve data, row by row. You can use this rather than DataFrame.collect(), if you don’t want to load all of the data into memory at once.

  • Added the median method to the RelationalGroupedDataFrame class.

Improvements

  • Improved the error message returned when an identifier is invalid.

  • Enhanced the error checking to report an error when no database or schema name is specified.

  • Added a performance improvement when inserting a large number of values in a table.

  • Updated the library to consistently handle Snowflake object identifiers (table and view names). Now, all parameters that specify table or view names support the use of:

    • Short names (e.g. table_name and view_name)

    • Fully-qualified names (e.g. database.schema.table_name)

    • Multi-part identifiers (e.g. Seq("database", "schema", "view_name"))

  • Added a check to verify that the supported version of Scala is being used. The library will report error if the Scala version is not compatible.

Bug Fixes

  • Fixed a problem with registering UDFs on Microsoft Windows.

  • Fixed a problem with the order of results when using DataFrame.sort() with DataFrame.limit().

  • Fixed Session.range() to generate a sequence of numbers without gaps.

Version 0.4.1

In this version, you no longer need to specify a temporary schema or temporary database for Snowpark objects (the TEMP_SCHEMA and TEMP_DB settings). The Snowpark library automatically creates temporary versions of the objects needed.

In addition, this version introduces the API changes listed in the next sections.

API Changes

  • Replaced the DataFrame.cache() method with the DataFrame.cacheResult() method.

    The new method creates and returns a new DataFrame with the cached results and has no effect on the current DataFrame. As a result of this change, the DataFrame object is now immutable.

New APIs

Bug Fixes

  • Fixed a problem when the DataFrame column name contains quotation marks.

  • Fixed a problem with the inability to escape data that contains backslashes, single quotes, and newline characters.

  • Fixed a problem where UDF creation fails with the error message “code too large”.

  • Fixed a problem where the UDF closure failed to capture the value of a local string variable.

  • Added the result schema for the following SQL clauses

    • GRANT/REVOKE

    • DESCRIBE

    • CREATE

    • USE

  • Fixed a problem when using Snowpark in Visual Studio Code with the Metals extension to create a UDF.