Snowpark Library for Python release notes for 2024

This article contains the release notes for the Snowpark Library for Python, including the following when applicable:

  • Behavior changes

  • New features

  • Customer-facing bug fixes

Snowflake uses semantic versioning for Snowpark Library for Python updates.

Version 1.18.0 (2024-05-28)

Version 1.18.0 of the Snowpark library introduces some new features.

New features

  • Added the DataFrame.cache_result and Series.cache_result methods for users to persist DataFrame and Series objects to a temporary table for the duration of a session to improve latency of subsequent operations.

Improvements

  • Added support for DataFrame.pivot_table with no index parameter and with the margins parameter.

  • Updated the signature of DataFrame.shift, Series.shift, DataFrameGroupBy.shift, and SeriesGroupBy.shift to match pandas 2.2.1. Snowpark pandas does not yet support the newly-added suffix argument or sequence values of periods.

  • Re-added support for Series.str.split.

Bug fixes

  • Fixed an issue with mixed columns for string methods (Series.str.*).

Local testing updates

New features

  • Added support for the following DataFrameReader read options to file formats CSV and JSON:

    • PURGE

    • PATTERN

    • INFER_SCHEMA with value False

    • ENCODING with value UTF8

  • Added support for DataFrame.analytics.moving_agg and DataFrame.analytics.cumulative_agg_agg.

  • Added support for the if_not_exists parameter during UDF and stored procedure registration.

Bug fixes

  • Fixed a bug with processing time formats where the fractional second part was not handled properly.

  • Fixed a bug that caused function calls on * to fail.

  • Fixed a bug that prevented the creation of map and struct type objects.

  • Fixed a bug where the function date_add was unable to handle some numeric types.

  • Fixed a bug where TimestampType casting resulted in incorrect data.

  • Fixed a bug that caused DecimalType data to have incorrect precision in some cases.

  • Fixed a bug where referencing a missing table or view raised an IndexError.

  • Fixed a bug where the mocked function to_timestamp_ntz could not handle None data.

  • Fixed a bug where mocked UDFs handled output data of None improperly.

  • Fixed a bug where DataFrame.with_column_renamed ignored attributes from parent DataFrames after join operations.

  • Fixed a bug where the integer precision of large values was lost when converted to a pandas DataFrame.

  • Fixed a bug where the schema of a datetime object was wrong when creating a DataFrame from a pandas DataFrame.

  • Fixed a bug in the implementation of Column.equal_nan where null data was handled incorrectly.

  • Fixed a bug where DataFrame.drop ignored attributes from parent DataFrames after join operations.

  • Fixed a bug in mocked function date_part where column type was set incorrectly.

  • Fixed a bug where DataFrameWriter.save_as_table did not raise exceptions when inserting null data into non-nullable columns.

  • Fixed a bug in the implementation of DataFrameWriter.save_as_table where:

    • Append or truncate failed when incoming data had a different schema than the existing table.

    • Truncate failed when incoming data did not specify columns that are nullable.

Improvements

  • Removed the dependency check for pyarrow because it is not used.

  • Improved the target type coverage of Column.cast, adding support for casting to boolean and all integral types.

  • Aligned the error experience when calling UDFs and stored procedures.

  • Added appropriate error messages for the is_permanent and anonymous options in UDFs and stored procedures registration to make it clearer that those features are not yet supported.

  • File read operations with unsupported options and values now raise NotImplementedError instead of warnings and unclear error information.

Version 1.17.0 (2024-05-21)

Version 1.17.0 of the Snowpark library introduces some new features.

New features

  • Added support to add a comment on tables and views using the functions listed below:

    • DataFrameWriter.save_as_table

    • DataFrame.create_or_replace_view

    • DataFrame.create_or_replace_temp_view

    • DataFrame.create_or_replace_dynamic_table

Improvements

  • Improved error message to remind users to set {"infer_schema": True} when reading CSV file without specifying its schema.

Local testing updates

New features

  • Added support for NumericType and VariantType data conversion in the mocked function to_timestamp_ltz, to_timestamp_ntz, to_timestamp_tz and to_timestamp.

  • Added support for DecimalType, BinaryType, ArrayType, MapType, TimestampType, DateType and TimeType data conversion in the mocked function to_char.

  • Added support for the following APIs:

    • snowflake.snowpark.functions.to_varchar

    • snowflake.snowpark.DataFrame.pivot

    • snowflake.snowpark.Session.cancel_all

  • Introduced a new exception class snowflake.snowpark.mock.exceptions.SnowparkLocalTestingException.

  • Added support for casting to FloatType.

Bug fixes

  • Fixed a bug that stored procedures and UDFs should not remove imports already in the sys.path during the clean-up step.

  • Fixed a bug that when processing datetime format, the fractional second part is not handled properly.

  • Fixed a bug where file operations on the Windows platform were unable to properly handle file separators in directory names.

  • Fixed a bug that on the Windows platform that, when reading a pandas dataframe, an IntervalType column with integer data can not be processed.

  • Fixed a bug that prevented users from being able to select multiple columns with the same alias.

  • Fixed a bug where Session.get_current_[schema|database|role|user|account|warehouse] returns uppercased identifiers when identifiers are quoted.

  • Fixed a bug that function substr and substring can not handle a zero-based start_expr.

Improvements

  • Standardized the error experience by raising SnowparkLocalTestingException in error cases, which is on par with the SnowparkSQLException raised in non-local execution.

  • Improved the error experience of the Session.write_pandas method so that NotImplementError will be raised when called.

  • Aligned the error experience with reusing a closed session in non-local execution.

Version 1.16.0 (2024-05-08)

Version 1.16.0 of the Snowpark library introduces some new features.

New features

  • Added snowflake.snowpark.Session.lineage.trace to explore data lineage of Snowflake objects.

  • Added support for registering stored procedures with packages given as Python modules.

  • Added support for structured type schema parsing.

Bug fixes

  • Fixed a bug where, when inferring a schema, single quotes were added to stage files that already had single quotes.

Local testing updates

New features

  • Added support for StringType, TimestampType and VariantType data conversion in the mocked function to_date.

  • Added support for the following APIs:

    • snowflake.snowpark.functions:

      • get

      • concat

      • concat_ws

Bug fixes

  • Fixed a bug that caused NaT and NaN values to not be recognized.

  • Fixed a bug where, when inferring a schema, single quotes were added to stage files that already had single quotes.

  • Fixed a bug where DataFrameReader.csv was unable to handle quoted values containing a delimiter.

  • Fixed a bug that when there is a None value in an arithmetic calculation, the output should remain None instead of math.nan.

  • Fixed a bug in function sum and covar_pop that when there is a math.nan value in the data, the output should also be math.nan.

  • Fixed a bug where stage operations can not handle directories.

  • Fixed a bug that DataFrame.to_pandas should take Snowflake numeric types with precision 38 as int64.

Version 1.15.0 (2024-04-24)

Version 1.15.0 of the Snowpark library introduces some new features.

New features

  • Added truncate save mode in DataFrameWrite to overwrite existing tables by truncating the underlying table instead of dropping it.

  • Added telemetry to calculate query plan height and number of duplicate nodes during collect operations.

  • Added the functions below to unload data from a DataFrame into one or more files in a stage:

    • DataFrame.write.json

    • DataFrame.write.csv

    • DataFrame.write.parquet

  • Added distributed tracing using open telemetry APIs for action functions in DataFrame and DataFrameWriter:

    • snowflake.snowpark.DataFrame:

      • collect

      • collect_nowait

      • to_pandas

      • count

      • show

    • snowflake.snowpark.DataFrameWriter:

      • save_as_table

  • Added support for snow:// URLs to snowflake.snowpark.Session.file.get and snowflake.snowpark.Session.file.get_stream

  • Added support to register stored procedures and UDFs with a comment.

  • UDAF client support is ready for public preview. Please stay tuned for the Snowflake announcement of UDAF public preview.

  • Added support for dynamic pivot. This feature is currently in private preview.

Improvements

  • Improved the generated query performance for both compilation and execution by converting duplicate subqueries to Common Table Expressions (CTEs). It is still an experimental feature and it is not enabled by default. You can enable it by setting session.cte_optimization_enabled to True.

Bug fixes

  • Fixed a bug where statement_params is not passed to query executions that register stored procedures and user defined functions.

  • Fixed a bug causing snowflake.snowpark.Session.file.get_stream to fail for quoted stage locations.

  • Fixed a bug that an internal type hint in utils.py might raise AttributeError when the underlying module can not be found.

Local testing updates

New features

  • Added support for registering UDFs and stored procedures.

  • Added support for the following APIs:

    • snowflake.snowpark.Session:

      • file.put

      • file.put_stream

      • file.get

      • file.get_stream

      • read.json

      • add_import

      • remove_import

      • get_imports

      • clear_imports

      • add_packages

      • add_requirements

      • clear_packages

      • remove_package

      • udf.register

      • udf.register_from_file

      • sproc.register

      • sproc.register_from_file

    • snowflake.snowpark.functions

      • current_database

      • current_session

      • date_trunc

      • object_construct

      • object_construct_keep_null

      • pow

      • sqrt

      • udf

      • sproc

  • Added support for StringType, TimestampType and VariantType data conversion in the mocked function to_time.

Bug fixes

  • Fixed a bug that null filled columns for constant functions.

  • Fixed to_object, to_array and to_binary to better handle null inputs.

  • Fixed a bug that timestamp data comparison can not handle years beyond 2262.

  • Fixed a bug that Session.builder.getOrCreate should return the created mock session.

Version 1.14.0 (2024-03-20)

Version 1.14.0 of the Snowpark library introduces some new features.

New features

  • Added support for creating vectorized UDTFs with the process method.

  • Added support for dataframe functions:

    • to_timestamp_ltz

    • to_timestamp_ntz

    • to_timestamp_tz

    • locate

  • Added support for ASOF JOIN type.

  • Added support for the following local testing APIs:

    • snowflake.snowpark.functions:

      • to_double

      • to_timestamp

      • to_timestamp_ltz

      • to_timestamp_ntz

      • to_timestamp_tz

      • greatest

      • least

      • convert_timezone

      • dateadd

      • date_part

    • snowflake.snowpark.Session:

      • get_current_account

      • get_current_warehouse

      • get_current_role

      • use_schema

      • use_warehouse

      • use_database

      • use_role

Improvements

  • Added telemetry to local testing.

  • Improved the error message of DataFrameReader to raise FileNotFound error when reading a path that does not exist or when there are no files under the path.

Bug fixes

  • Fixed a bug in SnowflakePlanBuilder where save_as_table does not correctly filter columns whose names start with $ and is followed by a number.

  • Fixed a bug where statement parameters might have no effect when resolving imports and packages.

  • Fixed bugs in local testing:

    • LEFT ANTI and LEFT SEMI joins drop rows with null values.

    • DataFrameReader.csv incorrectly parses data when the optional parameter field_optionally_enclosed_by is specified.

    • Column.regexp only considers the first entry when pattern is a Column.

    • Table.update raises KeyError when updating null values in the rows.

    • VARIANT columns raise errors at DataFrame.collect.

    • count_distinct does not work correctly when counting.

    • Null values in integer columns raise TypeError.

Version 1.13.0 (2024-02-26)

Version 1.13.0 of the Snowpark library introduces some new features.

New Features

  • Added support for an optional date_part argument in function last_day.

  • SessionBuilder.app_name will set the query_tag after the session is created.

  • Added support for the following local testing functions:

    • current_timestamp

    • current_date

    • current_time

    • strip_null_value

    • upper

    • lower

    • length

    • initcap

Improvements

  • Added cleanup logic at interpreter shutdown to close all active sessions.

Bug fixes

  • Fixed a bug in DataFrame.to_local_iterator where the iterator could yield wrong results if another query is executed before the iterator finishes due to wrong isolation level.

  • Fixed a bug that truncated table names in error messages while running a plan with local testing enabled.

  • Fixed a bug that Session.range returns empty result when the range is large.

Version 1.12.1 (2024-02-08)

Version 1.12.1 of the Snowpark library introduces some new features.

Improvements

  • Use split_blocks=True by default, during to_pandas conversion, for optimal memory allocation. This parameter is passed to pyarrow.Table.to_pandas, which enables PyArrow to split the memory allocation into smaller, more manageable blocks instead of allocating a single contiguous block. This results in better memory management when dealing with larger datasets.

Bug fixes

  • Fixed a bug in DataFrame.to_pandas that caused an error when evaluating on a Dataframe with an IntergerType column with null values.

Version 1.12.0 (2024-01-29)

Version 1.12.0 of the Snowpark library introduces some new features.

Behavior Changes (API Compatible)

  • When parsing data types during a to_pandas operation, we rely on GS precision value to fix precision issues for large integer values. This may affect users where a column that was earlier returned as int8 gets returned as int64. Users can fix this by explicitly specifying precision values for their return column.

  • Aligned behavior for Session.call in case of table stored procedures where running Session.call would not trigger a stored procedure unless a collect() operation was performed.

  • StoredProcedureRegistration now automatically adds snowflake-snowpark-python as a package dependency on the client’s local version of the library. An error is thrown if the server cannot support that version.

New features

  • Exposed statement_params in StoredProcedure.__call__.

  • Added two optional arguments to Session.add_import:

    • chunk_size: The number of bytes to hash per chunk of the uploaded files.

    • whole_file_hash: By default only the first chunk of the uploaded import is hashed to save time. When this is set to True each uploaded file is fully hashed instead.

  • Added parameters external_access_integrations and secrets when creating a UDAF from Snowpark Python to allow integration with external access.

  • Added a new method Session.append_query_tag, which allows an additional tag to be added to the current query tag by appending it as a comma separated value.

  • Added a new method Session.update_query_tag, which allows updates to a JSON encoded dictionary query tag.

  • SessionBuilder.getOrCreate will now attempt to replace the singleton it returns when token expiration has been detected.

  • Added the following functions in snowflake.snowpark.functions:

    • array_except

    • create_map

    • sign / signum

  • Added the following functions to DataFrame.analytics:

    • Added the moving_agg function in DataFrame.analytics to enable moving aggregations like sums and averages with multiple window sizes.

    • Added the cummulative_agg function in DataFrame.analytics to enable moving aggregations like sums and averages with multiple window sizes.

Bug fixes

  • Fixed a bug in DataFrame.na.fill that caused Boolean values to erroneously override integer values.

  • Fixed a bug in Session.create_dataframe where the Snowpark DataFrames created using pandas DataFrames were not inferring the type for timestamp columns correctly. The behavior is as follows:

    • Earlier timestamp columns without a timezone would be converted to nanosecond epochs and inferred as LongType(), but will now be correctly maintained as timestamp values and be inferred as TimestampType(TimestampTimeZone.NTZ).

    • Earlier timestamp columns with a timezone would be inferred as TimestampType(TimestampTimeZone.NTZ) and loose timezone information but will now be correctly inferred as TimestampType(TimestampTimeZone.LTZ) and timezone information is retained correctly.

    • Set session parameter PYTHON_SNOWPARK_USE_LOGICAL_TYPE_FOR_CREATE_DATAFRAME to revert back to old behavior. Snowflake recommends that you update your code to align with correct behavior because the parameter will be removed in the future.

  • Fixed a bug that DataFrame.to_pandas gets decimal type when scale is not 0, and creates an object dtype in pandas. Instead, we cast the value to a float64 type.

  • Fixed bugs that wrongly flattened the generated SQL when one of the following happens:

    • DataFrame.filter() is called after DataFrame.sort().limit().

    • DataFrame.sort() or filter() is called on a DataFrame that already has a window function or sequence-dependent data generator column. For instance, df.select("a", seq1().alias("b")).select("a", "b").sort("a") won’t flatten the sort clause anymore.

    • A window or sequence-dependent data generator column is used after DataFrame.limit(). For instance, df.limit(10).select(row_number().over()) won’t flatten the limit and select in the generated SQL.

  • Fixed a bug where aliasing a DataFrame column raised an error when the DataFame was copied from another DataFrame with an aliased column. For instance,

    df = df.select(col("a").alias("b"))
    df = copy(df)
    df.select(col("b").alias("c"))  # Threw an error. Now it's fixed.
    
    Copy
  • Fixed a bug in Session.create_dataframe that the non-nullable field in a schema is not respected for Boolean type. Note that this fix is only effective when the user has the privilege to create a temp table.

  • Fixed a bug in SQL simplifier where non-select statements in session.sql dropped a SQL query when used with limit().

  • Fixed a bug that raised an exception when session parameter ERROR_ON_NONDETERMINISTIC_UPDATE is true.