Snowpark Library for Python release notes for 2023¶

This article contains the release notes for the Snowpark Library for Python, including the following when applicable:

Behavior changes
New features
Customer-facing bug fixes

Snowflake uses semantic versioning for Snowpark Library for Python updates.

See Snowpark Developer Guide for Python for documentation.

Version 1.11.1 (2023-12-07)¶

Version 1.11.1 of the Snowpark library introduces some new features.

New features¶

Added the conn_error attribute to SnowflakeSQLException, which stores the whole underlying exception from snowflake-connector-python.
Added support for RelationalGroupedDataframe.pivot() to access pivot in the following pattern Dataframe.group_by(...).pivot(...).
Added the experimental feature, Local Testing Mode, which allows you to create and operate on Snowpark Python DataFrames locally without connecting to a Snowflake account. You can use the local testing framework to test your DataFrame operations locally, on your development machine or in a CI (continuous integration) pipeline, before deploying code changes to your account.
Added support for arrays_to_object new functions in snowflake.snowpark.functions.
Added support for the vector data type.

Dependency updates¶

Bumped the cloudpickle dependency to work with cloudpickle==2.2.1.
Updated snowflake-connector-python to version 3.4.0.

Bug fixes¶

DataFrame column names quoting check now supports newline characters.
Fixed a bug where a DataFrame generated by session.read.with_metadata created an inconsistent table when doing df.write.save_as_table.

Version 1.10.0 (2023-11-03)¶

Version 1.10.0 of the Snowpark library introduces some new features.

New features¶

Added support for managing case sensitivity in DataFrame.to_local_iterator().
Added support for specifying vectorized UDTF’s input column names by using the optional parameter input_names in UDTFRegistration.register, UDTFRegistration.register_file, and functions.pandas_udtf. By default, RelationalGroupedDataFrame.applyInPandas will infer the column names from current DataFrame schema.
Added sql_error_code and raw_message attributes to SnowflakeSQLException when it is caused by a SQL exception.

Bug fixes¶

Fixed a bug in DataFrame.to_pandas() where converting Snowpark DataFrames to Pandas DataFrames was losing precision on integers with more than 19 digits.
Fixed a bug in session.add_packages where it could not handle a requirement specifier that contained a project name with an underscore and a version.
Fixed a bug in DataFrame.limit() when offset is used and the parent DataFrame uses limit. Now the offset won’t impact the parent DataFrame’s limit.
Fixed a bug in DataFrame.write.save_as_table where DataFrames created from the read API could not save data into Snowflake because of an invalid column name $1.

Behavior changes¶

Changed the behavior of date_format:
- The format argument changed from optional to required.
- The returned result changed from a date object to a date-formatted string.
When a window function or a sequence-dependent data generator (normal, zipf, uniform, seq1, seq2, seq4, seq8) function is used, the sort and filter operation will no longer be flattened when generating the query.

Version 1.9.0 (2023-10-16)¶

Version 1.9.0 of the Snowpark library introduces some new features.

New features¶

Added support for the Python 3.11 runtime environment.
Support PythonObjJSONEncoder JSON-serializable objects for ARRAY and OBJECT literals.

Dependency updates¶

Re-added the dependency of typing-extensions.

Bug fixes¶

Fixed a bug where imports from permanent stage locations were ignored for temporary stored procedures, UDTFs, UDFs, and UDAFs.
Revert back to using CTAS (CREATE TABLE AS SELECT) statement for DataFrameWriter.save_as_table which does not need insert permission for writing tables.

Version 1.8.0 (2023-09-14)¶

Version 1.8.0 of the Snowpark library introduces some new features.

New features¶

Added support for VOLATILE and IMMUTABLE keywords when registering UDFs.
Added support for specifying clustering keys when saving dataframes using DataFrame.save_as_table.
Accept Iterable objects input for schema when creating dataframes using Session.create_dataframe.
Added the DataFrame.session property to return a Session object.
Added the Session.session_id property to return an integer that represents the session ID.
Added the Session.connection property to return a SnowflakeConnection object.
Added support for creating a Snowpark session from a configuration file or environment variables.

Dependency updates¶

Updated snowflake-connector-python to 3.2.0.

Bug fixes¶

Fixed a bug where an automatic package upload would raise ValueError even when compatible package versions were added in session.add_packages.
Fixed a bug where table stored procedures were not registered correctly when using register_from_file.
Fixed a bug where dataframe joins failed with invalid_identifier error.
Fixed a bug where DataFrame.copy disabled SQL simplifier for the returned copy.
Fixed a bug where session.sql().select() would fail if any parameters were specified to session.sql().

Version 1.7.0 (2023-08-28)¶

Version 1.7.0 of the Snowpark library introduces some new features.

Behavior changes¶

When creating stored procedures, UDFs, UDTFs, and UDAFs with the parameter is_permanent=False, temporary objects are created even when stage_name is provided. The default value of is_permanent is False, which is why if this value is not explicitly set to True for permanent objects, users will notice a change in behavior.
types.StructField now enquotes column identifier by default.

New features¶

Added parameters external_access_integrations and secrets that can be used when creating a UDF, UDTF or stored procedure from Snowpark Python to allow integration with external access.
Added support for these new functions in snowflake.snowpark.functions: array_flatten and flatten.
Added support for apply_in_pandas in snowflake.snowpark.relational_grouped_dataframe.
Added support for replicating your local Python environment on Snowflake via Session.replicate_local_environment.

Bug fixes¶

Fixed a bug where session.create_dataframe fails to properly set nullable columns where nullability was affected by order or when data was given.
Fixed a bug where DataFrame.select could not identify and alias columns when using table functions when output columns of the table function overlapped with columns in the DataFrame.

Version 1.6.1 (2023-08-02)¶

Behavior changes¶

DataFrameWriter.save_as_table now respects nullable field of for schema provided by the user, or inferred schema based on data from user input.

New features¶

Added support for new functions in snowflake.snowpark.functions:
- array_sort
- sort_array
- array_min
- array_max
- explode_outer
Added support for pure Python packages specified via Session.add_requirements or Session.add_packages. They are now usable in stored procedures and UDFs even if packages are not present on the Snowflake Anaconda channel.
Added the Session parameter custom_packages_upload_enabled and custom_packages_force_upload_enabled to enable the support for pure Python packages feature mentioned above. Both parameters default to False.
Added support for specifying package requirements by passing a conda environment YAML file to Session.add_requirements.
Added support for asynchronous execution of multi-query dataframes that contain binding variables.
Added support for renaming multiple columns in DataFrame.rename.
Added support for Geometry datatypes.
Added support for params in session.sql() in stored procedures.
Added support for user-defined aggregate functions (UDAFs). This feature is currently in private preview.
Added support for vectorized user-defined table functions (vectorized UDTFs). This feature is currently in public preview.
Added support for Snowflake Timestamp variants (i.e., TIMESTAMP_NTZ, TIMESTAMP_LTZ, TIMESTAMP_TZ):
- Added TimestampTimezone as an argument in TimestampType constructor.
- Added type hints: NTZ, LTZ, TZ and Timestamp to annotate functions when registering UDFs.

Improvements¶

Removed redundant dependency typing-extensions.
DataFrame.cache_result now creates a temp table of fully-qualified names under the current database and schema.

Bug fixes¶

Fixed a bug where type check happens on pandas before it is imported.
Fixed a bug when creating a UDF from numpy.ufunc.
Fixed a bug where DataFrame.union was not generating the correct Selectable.schema_query when SQL simplifier is enabled.

Dependency updates¶

Updated snowflake-connector-python to version 3.0.4.

Version 1.5.1 (2023-06-20)¶

New features and updates¶

Added support for the Python 3.10 runtime environment.

Version 1.5.0 (2023-06-13)¶

Behavior changes¶

Aggregation results, from functions such as DataFrame.agg and DataFrame.describe, no longer strip away non-printing characters from column names.

New features and updates¶

Added support for the Python 3.9 runtime environment.
Added support for new functions in snowflake.snowpark.functions:
array_generate_range
array_unique_agg
collect_set
sequence
Added support for registering and calling stored procedures with the TABLE return type.
Added support for parameter length in StringType() to specify the maximum number of characters that can be stored by the column.
Added the alias functions.element_at() for functions.get().
Added the alias Column.contains for functions.contains.
Added the experimental feature DataFrame.alias.
Added support for querying metadata columns from stage when creating DataFrame using DataFrameReader.
Added support for StructType.add to append more fields to existing StructType objects.
Added support for parameter execute_as in StoredProcedureRegistration.register_from_file() to specify stored procedure caller rights.

Bug fixes¶

Fixed a bug where the Dataframe.join_table_function did not run all of the necessary queries to set up the join table function when SQL simplifier was enabled.
Fixed type hint declaration for custom types: ColumnOrName, ColumnOrLiteralStr, ColumnOrSqlExpr, LiteralType and ColumnOrLiteral that were breaking mypy checks.
Fixed a bug where DataFrameWriter.save_as_table and DataFrame.copy_into_table failed to parse fully qualified table names.

Version 1.4.0 (2023-04-24)¶

New features¶

Added support for session.getOrCreate.
Added support for alias Column.getField.
Added support for new functions in snowflake.snowpark.functions:
- date_add and date_sub to make add and subtract operations easier.
- ddaydiff
- dexplode
- darray_distinct
- dregexp_extract
- dstruct
- dformat_number
- dbround
- dsubstring_index
Added parameter skip_upload_on_content_match when creating UDFs, UDTFs, and stored procedures using register_from_file to skip uploading files to a stage if the same version of the files are already on the stage.
Added support for the DataFrame.save_as_table method to take table names that contain dots.
Flattened generated SQL when DataFrame.filter() or DataFrame.order_by() is followed by a projection statement (e.g. DataFrame.select(), DataFrame.with_column()).
Added support for creating dynamic tables (in private preview) using Dataframe.create_or_replace_dynamic_table.
Added an optional argument, params, in session.sql() to support binding variables. Note that this argument is not supported in stored procedures yet.

Bug fixes¶

Fixed a bug in strtok_to_array where an exception was thrown when a delimiter was passed in.
Fixed a bug in session.add_import where the module had the same namespace as other dependencies.

Version 1.3.0 (2023-03-28)¶

New features¶

Added support for the delimiters parameter in functions.initcap().
Added support for functions.hash() to accept a variable number of input expressions.
Added API Session.conf for getting, setting or checking the mutability of any runtime configuration.
Added support for managing case sensitivity in Row results from DataFrame.collect using case_sensitive parameter.
Added indexer support for snowflake.snowpark.types.StructType.
Added a keyword argument log_on_exception to Dataframe.collect and Dataframe.collect_no_wait to optionally disable error logging for SQL exceptions.

Bug fixes¶

Fixed a bug where a DataFrame set operation (DataFrame.subtract, DataFrame.union, etc.) being called after another DataFrame set operation and DataFrame.select or DataFrame.with_column throws an exception.
Fixed a bug where chained sort statements are overwritten by the SQL simplifier.

Improvements¶

Simplified JOIN queries to use constant subquery aliases (SNOWPARK_LEFT, SNOWPARK_RIGHT) by default. Users can disable this at runtime with session.conf.set('use_constant_subquery_alias', False) to use randomly generated alias names instead.
Allowed specifying statement parameters in session.call().
Enabled the uploading of large pandas DataFrames in stored procedures by defaulting to a chunk size of 100,000 rows.

Version 1.2.0 (2023-03-02)¶

New features and updates¶

Added support for displaying source code as comments in the generated scripts when registering stored procedures. This is enabled by default, turn off by specifying source_code_display=False at registration.
Added a parameter if_not_exists when creating a UDF, UDTF or Stored Procedure from Snowpark Python to ignore creating the specified function or procedure if it already exists.
Accept integers when calling snowflake.snowpark.functions.get to extract value from array.
Added functions.reverse in functions to open access to Snowflake built-in function REVERSE.
Added parameter require_scoped_url in snowflake.snowflake.files.SnowflakeFile.open() (in Private Preview) to replace is_owner_file, which is marked for deprecation.

Bug fixes¶

Fixed a bug that overwrote paramstyle to qmark when creating a Snowpark session.
Fixed a bug where df.join(..., how="cross") fails with SnowparkJoinException: (1112): Unsupported using join type 'Cross'.
Fixed a bug where querying a DataFrame column created from chained function calls used a wrong column name.

Version 1.1.0 (2023-01-26)¶

New features and updates¶

Added asc, asc_nulls_first, asc_nulls_last, desc, desc_nulls_first, desc_nulls_last, date_part, and unix_timestamp in functions.
Added the property DataFrame.dtypes to return a list of column name and data type pairs.
Added the following aliases:
- functions.expr() for functions.sql_expr().
- functions.date_format() for functions.to_date().
- functions.monotonically_increasing_id() for functions.seq8().
- functions.from_unixtime() for functions.to_timestamp().

Bug fixes¶

Fixed a bug in SQL simplifier that didn’t handle Column alias and join well in some cases. See https://github.com/snowflakedb/snowpark-python/issues/658 for details.
Fixed a bug in SQL simplifier that generated wrong column names for function calls, NaN and INF.

Improvements¶

The session parameter PYTHON_SNOWPARK_USE_SQL_SIMPLIFIER will be True after Snowflake 7.3 is released. In snowpark-python, session.sql_simplifier_enabled reads the value of PYTHON_SNOWPARK_USE_SQL_SIMPLIFIER by default, meaning that the SQL simplifier is enabled by default after the Snowflake 7.3 release. To turn this off, set PYTHON_SNOWPARK_USE_SQL_SIMPLIFIER in Snowflake to False or run session.sql_simplifier_enabled = False from Snowpark. It is recommended to use the SQL simplifier because it helps to generate more concise SQL.