Snowpark Library for Scala and Java release notes for 2022

This article contains the release notes for the Snowpark Library for Scala and Snowpark Library for Java, including the following when applicable:

  • Behavior changes

  • New features

  • Customer-facing bug fixes

Snowflake uses semantic versioning for Snowpark Library for Scala and Java updates.

Version 1.6.2 (October 26, 2022)

Compatible Snowflake release: 6.35.x

Improvements

  • Made internal improvements for stored procedures written in Java or Scala.

Version 1.6.1 (September 30, 2022)

Compatible Snowflake release: 6.31.x

This version has a known issue which might break temp object creation. Please use 1.6.2 instead.

Improvements

  • Made internal improvements for stored procedures written in Java or Scala.

Version 1.6.0 (August 12, 2022)

Compatible Snowflake release: 6.27.x

Improvements

  • Made internal improvements to UDTFs.

Version 1.5.0 (July 1, 2022)

Compatible Snowflake release: 6.22.x

New features

  • Added support for writing DataFrames to files on a stage to the Scala API and Java API.

Improvements

  • Optimized the SQL queries generated by the Snowpark client library.

  • Improved the error message that is logged when the Snowpark library fails to resolve a column name in

  • a DataFrame (e.g. when you attempt to access a column that does not exist).

Version 1.4.1 (May 26, 2022)

Compatible Snowflake release: 6.17.x

Changes

  • Updated the version of jackson-core and jackson-annotations that the Snowpark library depends on to 2.13.2.

  • Updated the version of jackson-databind that the Snowpark library depends on to 2.13.2.2.

  • Removed the jackson-core, jackson-databind, and jackson-annotations classes from Snowpark JAR file.

    If you downloaded the .tar.gz / .zip file, the JAR files for the Jackson classes are now provided separately in the lib/ subdirectory (jackson-core-2.13.2.jar, jackson-databind-2.13.2.2.jar, and jackson-annotations-2.13.2.jar).

    If you are specifying the Snowpark library as a dependency in your pom.xml file and you want to depend on a different version of the Jackson libraries in your pom.xml, you can exclude the dependency on the Jackson libraries from the Snowpark library dependency.

Version 1.4.0 (April 28, 2022)

Compatible Snowflake release: 6.14.x

New features

  • Made the Snowpark Java API generally available on AWS and Azure.

  • The API is still available as a preview feature in GCS.

  • Made the Snowpark Scala API generally available on Azure.

    Prior to this release, the API was only generally available on AWS. The API is still available as a preview feature on GCS.

  • Added a Java API for creating UDTFs. Note that this is a preview feature.

  • Added new APIs in Scala and Java for uploading and downloading data from a stage (FileOperation.uploadStream and FileOperation.downloadStream).

  • Added the DataFrameWriter.option method in Scala and Java for specifying how values in columns in the DataFrame should be mapped to columns in the table. The option method allows you to specify that the DataFrameWriter should use the column name, rather than the column order.

Improvements

  • Disabled the Closure Cleaner in Java sessions. The Closure Cleaner only works in Scala programs.

  • Improved Array and Map support in the Java Row API.

Version 1.3.0 (March 18, 2022)

Compatible Snowflake release: 6.8.x

New features

Version 1.2.0 (March 2, 2022)

Compatible Snowflake release: 6.5.x

New features

  • Added the Java API for Snowpark.

  • Added preview support in the Scala API for creating UDTFs.

  • Added a separate version of the library that complies with the security requirements of FIPS (Federal Information Processing Standard). You can download this library from:

    To point to the FIPS-compliant library from an sbt build file or Maven project, use snowpark-fips as the artifactId.

Version 1.1.0 (February 4, 2022)

Compatible Snowflake release: 6.2.x

Added support for Writing Stored Procedures in Scala.

The API reference for this release is available in the Snowflake documentation and in a .zip or .tar.gz file in the Snowflake Client Repository.

Version 1.0.0 (January 26, 2022)

Compatible Snowflake release: 6.1.x

General availability (GA) release on AWS. (Snowpark is still a preview feature on Azure and GCP.)

The API reference for this release is available in a .zip or .tar.gz file in the Snowflake Client Repository.

Version 0.12.0 (January 4, 2022)

Compatible Snowflake release: 5.45.x

The API reference for this release is available in a .zip or .tar.gz file in the Snowflake Client Repository.

New features

  • Added the listagg function to the functions object.

  • Added support for UDFs with 11 to 22 arguments.

  • Added the any_value function to the RelationalGroupedDataFrame class.

Improvements

  • In the generated code for UDFs, replaced a static code block with an object instance function.

  • Reorganized error messages.

  • Changed the saveAsTable function so that a new table is not created in Append mode.

  • Improved the callUDF function to support any type of argument.

  • Changed the library to set the query tag at the statement level, rather than at the session level.

Version 0.11.0 (November 16, 2021)

Compatible Snowflake release: 5.45.x

The API reference for this release is available in a .zip or .tar.gz file in the Snowflake Client Repository.

New features

Improvements

Upgraded the Snowflake JDBC driver to 3.13.9. Improved the error message reported when no current database is selected for use.

Version 0.10.1 (October 27, 2021)

Compatible Snowflake release: 5.38.x

The API reference for this release is available in a .zip or .tar.gz file in the Snowflake Client Repository.

Bug fixes

  • Fixed a problem with uploading files to a GCP stage where the wrong prefix was used.

  • Fixed a problem in which a 403 HTTP response was returned when accessing a pre-signed URL for GCP.

Version 0.10.0 (October 18, 2021)

Compatible Snowflake release: 5.37.x

The API reference for this release is available in a .zip or .tar.gz file in the Snowflake Client Repository.

New features

  • Added the new method dropDuplicates to the DataFrame class.

  • Added support for in expressions to the Column class (with the in method) and the functions object (with the in function).

  • Extended the Iterator returned by DataFrame.toLocalIterator to support the Closeable interface, which allows you to call the close method on the iterator.

  • Added support for the new configuration property snowpark_request_timeout_in_seconds. You can set this in the configuration map / file to adjust the timeout that the library uses when uploading dependencies to a stage. By default, the timeout is 86400 (1 day).

Improvements

Behavior changes

  • Removed APIs intended only for Java from the Scala API.

  • Replaced the default logger log4j with SLF4J SimpleLogger.

Bug fixes

  • Updated the library to close unused statements automatically in order to reduce memory usage.

  • Fixed the column order in the result of the DataFrame.withColumns method.

Version 0.9.0 (September 20, 2021)

Compatible Snowflake release: 5.34.x

The API reference for this release is available in a .zip or .tar.gz file in the Snowflake Client Repository.

New features

Behavior changes

  • Changed the DataFrame.union() and DataFrame.unionByName() methods to use UNION, rather than UNION ALL.

Bug fixes

  • Fixed the error SQL compilation error: Missing column specification that could occur when the Snowpark library created a temporary view.

Version 0.8.0 (August 9, 2021)

Compatible Snowflake release: 5.30.x

The API reference for this release is available in a .zip or .tar.gz file in the Snowflake Client Repository.

Improvements

  • Refactored some internal code to remove some dependencies.

Bug fixes

  • Fixed an issue with BigDecimal literals in cases where scale might be larger than precision.

  • Fixed an issue that could occur when performing multiple set operations (e.g. union, intersect, etc.).

Version 0.7.0 (July 23, 2021)

Compatible Snowflake release: 5.29.x

The API reference for this release is available in a .zip or .tar.gz file in the Snowflake Client Repository.

New APIs

  • Introduced the new Session.close() method. Call this method to close the Snowpark session, which cancels all running queries and prevents the subsequent use of this session to execute queries.

  • Introduced the new Updatable class. Updatable extends the DataFrame class and provides additional table-related capabilities (e.g. the ability to update and delete values).

  • The Session.table() method now returns an Updatable object, rather than a DataFrame object.

  • Introduced new signatures for the registerTemporary methods in the UDFRegistration class. These signatures do not have a parameter for the name of the UDF, which means that you can use these to register an anonymous temporary UDF.

API Changes

  • As mentioned above, the Session.table() method now returns an Updatable object, which extends DataFrame.

  • In the Geography class, removed support for formats other than GeoJSON. Now, Geography only supports the GeoJSON data format.

Improvements

  • Improved the DataFrame.cacheResult() method to reduce the possibility of “object already exists” errors.

  • Improved some error messages.

  • Added a new log message that prints out session information after you log in.

Bug fixes

  • Fixed an issue in which the DataFrame.show() method did not display binary data correctly.

  • Fixed an error that occurred when getting the version number.

Version 0.6.0 (June 14, 2021)

Compatible Snowflake release: 5.21.x

Preview release on AWS

The API reference for this release is available in a .zip or .tar.gz file in the Snowflake Client Repository.

API Changes

In this release, the following methods in RelationalGroupedDataFrame now require an argument:

  • avg

  • max

  • median

  • min

  • sum

In previous releases, if you called these methods without an argument, these methods were applied to all numeric columns in the DataFrame. For example, for a DataFrame df with the columns (a int, b string, c float), calling df.groupBy("a").max() was equivalent to calling df.groupBy("a").max(col("a"), col("c")).

With this release, calling these methods without an argument results in a SnowparkClientException.

Version 0.5.0

New features

  • Added a maxWidth parameter to the DataFrame.show() method. You can use this parameter to adjust the number of characters printed in the output for each column.

  • Added the Session.cancelAll() method, which you can use to cancel all running actions on this session.

  • Added the DataFrame.toLocalIterator() method, which returns an iterator that you can use to retrieve data, row by row. You can use this rather than DataFrame.collect(), if you don’t want to load all of the data into memory at once.

  • Added the median method to the RelationalGroupedDataFrame class.

Improvements

  • Improved the error message returned when an identifier is invalid.

  • Enhanced the error checking to report an error when no database or schema name is specified.

  • Added a performance improvement when inserting a large number of values in a table.

  • Updated the library to consistently handle Snowflake object identifiers (table and view names). Now, all parameters that specify table or view names support the use of:

    • Short names (e.g. table_name and view_name)

    • Fully-qualified names (e.g. database.schema.table_name)

    • Multi-part identifiers (e.g. Seq(“database”, “schema”, “view_name”))

  • Added a check to verify that the supported version of Scala is being used. The library will report error if the Scala version is not compatible.

Bug fixes

  • Fixed a problem with registering UDFs on Microsoft Windows.

  • Fixed a problem with the order of results when using DataFrame.sort() with DataFrame.limit().

  • Fixed Session.range() to generate a sequence of numbers without gaps.

Version 0.4.1

In this version, you no longer need to specify a temporary schema or temporary database for Snowpark objects (the TEMP_SCHEMA and TEMP_DB settings). The Snowpark library automatically creates temporary versions of the objects needed.

API Changes

Replaced the DataFrame.cache() method with the DataFrame.cacheResult() method.

The new method creates and returns a new DataFrame with the cached results and has no effect on the current DataFrame. As a result of this change, the DataFrame object is now immutable.

New APIs

  • Added the following new methods to the RelationalGroupedDataFrame class:

    • avg

    • max

  • Added the following new methods to the DataFrame class:

    • groupByGroupingSets

    • clone

    • createOrReplaceTempView

  • Added the following new functions to the functions object:

    • toScalar

  • Added a Session.file object, which provides the following new methods for performing file operations:

    • get

    • put

  • Made the following changes to the Session.createDataFrame method:

    • Added support for user-provided schemas.

    • Added support for specifying an array/map of variant/geography data.

    • Added support for Geography/Variant data types in UDFs.

  • Added registerPermanent methods to the UDFRegistration class.

Bug fixes

  • Fixed a problem when the DataFrame column name contains quotation marks.

  • Fixed a problem with the inability to escape data that contains backslashes, single quotes, and newline characters.

  • Fixed a problem where UDF creation fails with the error message “code too larger”.

  • Fixed a problem where the UDF closure failed to capture the value of a local string variable.

  • Added the result schema for the following SQL clauses:

    • GRANT/REVOKE

    • DESCRIBE

    • CREATE

    • USE

  • Fixed a problem when using Snowpark in Visual Studio Code with the Metals extension to create a UDF.