Snowpark Library for Python release notes for 2023

This article contains the release notes for the Snowpark Library for Python, including the following when applicable:

  • Behavior changes

  • New features

  • Customer-facing bug fixes

Snowflake uses semantic versioning for Snowpark Library for Python updates.

Version 1.11.1 (2023-12-07)

Version 1.11.1 of the Snowpark library introduces some new features.

New features

  • Added the conn_error attribute to SnowflakeSQLException, which stores the whole underlying exception from snowflake-connector-python.

  • Added support for RelationalGroupedDataframe.pivot() to access pivot in the following pattern Dataframe.group_by(...).pivot(...).

  • Added the experimental feature, Local Testing Mode, which allows you to create and operate on Snowpark Python DataFrames locally without connecting to a Snowflake account. You can use the local testing framework to test your DataFrame operations locally, on your development machine or in a CI (continuous integration) pipeline, before deploying code changes to your account.

  • Added support for arrays_to_object new functions in snowflake.snowpark.functions.

  • Added support for the vector data type.

Dependency updates

  • Bumped the cloudpickle dependency to work with cloudpickle==2.2.1.

  • Updated snowflake-connector-python to version 3.4.0.

Bug fixes

  • DataFrame column names quoting check now supports newline characters.

  • Fixed a bug where a DataFrame generated by session.read.with_metadata created an inconsistent table when doing df.write.save_as_table.

Version 1.10.0 (2023-11-03)

Version 1.10.0 of the Snowpark library introduces some new features.

New features

  • Added support for managing case sensitivity in DataFrame.to_local_iterator().

  • Added support for specifying vectorized UDTF’s input column names by using the optional parameter input_names in UDTFRegistration.register, UDTFRegistration.register_file, and functions.pandas_udtf. By default, RelationalGroupedDataFrame.applyInPandas will infer the column names from current DataFrame schema.

  • Added sql_error_code and raw_message attributes to SnowflakeSQLException when it is caused by a SQL exception.

Bug fixes

  • Fixed a bug in DataFrame.to_pandas() where converting Snowpark DataFrames to Pandas DataFrames was losing precision on integers with more than 19 digits.

  • Fixed a bug in session.add_packages where it could not handle a requirement specifier that contained a project name with an underscore and a version.

  • Fixed a bug in DataFrame.limit() when offset is used and the parent DataFrame uses limit. Now the offset won’t impact the parent DataFrame’s limit.

  • Fixed a bug in DataFrame.write.save_as_table where DataFrames created from the read API could not save data into Snowflake because of an invalid column name $1.

Behavior changes

  • Changed the behavior of date_format:

    • The format argument changed from optional to required.

    • The returned result changed from a date object to a date-formatted string.

  • When a window function or a sequence-dependent data generator (normal, zipf, uniform, seq1, seq2, seq4, seq8) function is used, the sort and filter operation will no longer be flattened when generating the query.

Version 1.9.0 (2023-10-16)

Version 1.9.0 of the Snowpark library introduces some new features.

New features

  • Added support for the Python 3.11 runtime environment.

  • Support PythonObjJSONEncoder JSON-serializable objects for ARRAY and OBJECT literals.

Dependency updates

  • Re-added the dependency of typing-extensions.

Bug fixes

  • Fixed a bug where imports from permanent stage locations were ignored for temporary stored procedures, UDTFs, UDFs, and UDAFs.

  • Revert back to using CTAS (CREATE TABLE AS SELECT) statement for DataFrameWriter.save_as_table which does not need insert permission for writing tables.

Version 1.8.0 (2023-09-14)

Version 1.8.0 of the Snowpark library introduces some new features.

New features

  • Added support for VOLATILE and IMMUTABLE keywords when registering UDFs.

  • Added support for specifying clustering keys when saving dataframes using DataFrame.save_as_table.

  • Accept Iterable objects input for schema when creating dataframes using Session.create_dataframe.

  • Added the DataFrame.session property to return a Session object.

  • Added the Session.session_id property to return an integer that represents the session ID.

  • Added the Session.connection property to return a SnowflakeConnection object.

  • Added support for creating a Snowpark session from a configuration file or environment variables.

Dependency updates

  • Updated snowflake-connector-python to 3.2.0.

Bug fixes

  • Fixed a bug where an automatic package upload would raise ValueError even when compatible package versions were added in session.add_packages.

  • Fixed a bug where table stored procedures were not registered correctly when using register_from_file.

  • Fixed a bug where dataframe joins failed with invalid_identifier error.

  • Fixed a bug where DataFrame.copy disabled SQL simplfier for the returned copy.

  • Fixed a bug where session.sql().select() would fail if any parameters were specified to session.sql().

Version 1.7.0 (2023-08-28)

Version 1.7.0 of the Snowpark library introduces some new features.

Behavior changes

  • When creating stored procedures, UDFs, UDTFs, and UDAFs with the parameter is_permanent=False, temporary objects are created even when stage_name is provided. The default value of is_permanent is False, which is why if this value is not explicitly set to True for permanent objects, users will notice a change in behavior.

  • types.StructField now enquotes column identifier by default.

New features

  • Added parameters external_access_integrations and secrets that can be used when creating a UDF, UDTF or stored procedure from Snowpark Python to allow integration with external access.

  • Added support for these new functions in snowflake.snowpark.functions: array_flatten and flatten.

  • Added support for apply_in_pandas in snowflake.snowpark.relational_grouped_dataframe.

  • Added support for replicating your local Python environment on Snowflake via Session.replicate_local_environment.

Bug fixes

  • Fixed a bug where session.create_dataframe fails to properly set nullable columns where nullability was affected by order or when data was given.

  • Fixed a bug where DataFrame.select could not identify and alias columns when using table functions when output columns of the table function overlapped with columns in the DataFrame.

Version 1.6.1 (2023-08-02)

Behavior changes

  • DataFrameWriter.save_as_table now respects nullable field of for schema provided by the user, or inferred schema based on data from user input.

New features

  • Added support for new functions in snowflake.snowpark.functions:

    • array_sort

    • sort_array

    • array_min

    • array_max

    • explode_outer

  • Added support for pure Python packages specified via Session.add_requirements or Session.add_packages. They are now usable in stored procedures and UDFs even if packages are not present on the Snowflake Anaconda channel.

  • Added the Session parameter custom_packages_upload_enabled and custom_packages_force_upload_enabled to enable the support for pure Python packages feature mentioned above. Both parameters default to False.

  • Added support for specifying package requirements by passing a conda environment YAML file to Session.add_requirements.

  • Added support for asynchronous execution of multi-query dataframes that contain binding variables.

  • Added support for renaming multiple columns in DataFrame.rename.

  • Added support for Geometry datatypes.

  • Added support for params in session.sql() in stored procedures.

  • Added support for user-defined aggregate functions (UDAFs). This feature is currently in private preview.

  • Added support for vectorized user-defined table functions (vectorized UDTFs). This feature is currently in public preview.

  • Added support for Snowflake Timestamp variants (i.e., TIMESTAMP_NTZ, TIMESTAMP_LTZ, TIMESTAMP_TZ):

    • Added TimestampTimezone as an argument in TimestampType constructor.

    • Added type hints: NTZ, LTZ, TZ and Timestamp to annotate functions when registering UDFs.

Improvements

  • Removed redundant dependency typing-extensions.

  • DataFrame.cache_result now creates a temp table of fully-qualified names under the current database and schema.

Bug fixes

  • Fixed a bug where type check happens on pandas before it is imported.

  • Fixed a bug when creating a UDF from numpy.ufunc.

  • Fixed a bug where DataFrame.union was not generating the correct Selectable.schema_query when SQL simplifier is enabled.

Dependency updates

  • Updated snowflake-connector-python to version 3.0.4.

Version 1.5.1 (2023-06-20)

New features and updates

  • Added support for the Python 3.10 runtime environment.

Version 1.5.0 (2023-06-13)

Behavior changes

  • Aggregation results, from functions such as DataFrame.agg and DataFrame.describe, no longer strip away non-printing characters from column names.

New features and updates

  • Added support for the Python 3.9 runtime environment.

  • Added support for new functions in snowflake.snowpark.functions:

  • array_generate_range

  • array_unique_agg

  • collect_set

  • sequence

  • Added support for registering and calling stored procedures with the TABLE return type.

  • Added support for parameter length in StringType() to specify the maximum number of characters that can be stored by the column.

  • Added the alias functions.element_at() for functions.get().

  • Added the alias Column.contains for functions.contains.

  • Added the experimental feature DataFrame.alias.

  • Added support for querying metadata columns from stage when creating DataFrame using DataFrameReader.

  • Added support for StructType.add to append more fields to existing StructType objects.

  • Added support for parameter execute_as in StoredProcedureRegistration.register_from_file() to specify stored procedure caller rights.

Bug fixes

  • Fixed a bug where the Dataframe.join_table_function did not run all of the necessary queries to set up the join table function when SQL simplifier was enabled.

  • Fixed type hint declaration for custom types: ColumnOrName, ColumnOrLiteralStr, ColumnOrSqlExpr, LiteralType and ColumnOrLiteral that were breaking mypy checks.

  • Fixed a bug where DataFrameWriter.save_as_table and DataFrame.copy_into_table failed to parse fully qualified table names.

Version 1.4.0 (2023-04-24)

New features

  • Added support for session.getOrCreate.

  • Added support for alias Column.getField.

  • Added support for new functions in snowflake.snowpark.functions:

    • date_add and date_sub to make add and subtract operations easier.

    • ddaydiff

    • dexplode

    • darray_distinct

    • dregexp_extract

    • dstruct

    • dformat_number

    • dbround

    • dsubstring_index

  • Added parameter skip_upload_on_content_match when creating UDFs, UDTFs, and stored procedures using register_from_file to skip uploading files to a stage if the same version of the files are already on the stage.

  • Added support for the DataFrame.save_as_table method to take table names that contain dots.

  • Flattened generated SQL when DataFrame.filter() or DataFrame.order_by() is followed by a projection statement (e.g. DataFrame.select(), DataFrame.with_column()).

  • Added support for creating dynamic tables (in private preview) using Dataframe.create_or_replace_dynamic_table.

  • Added an optional argument, params, in session.sql() to support binding variables. Note that this argument is not supported in stored procedures yet.

Bug fixes

  • Fixed a bug in strtok_to_array where an exception was thrown when a delimiter was passed in.

  • Fixed a bug in session.add_import where the module had the same namespace as other dependencies.

Version 1.3.0 (2023-03-28)

New features

  • Added support for the delimiters parameter in functions.initcap().

  • Added support for functions.hash() to accept a variable number of input expressions.

  • Added API Session.conf for getting, setting or checking the mutability of any runtime configuration.

  • Added support for managing case sensitivity in Row results from DataFrame.collect using case_sensitive parameter.

  • Added indexer support for snowflake.snowpark.types.StructType.

  • Added a keyword argument log_on_exception to Dataframe.collect and Dataframe.collect_no_wait to optionally disable error logging for SQL exceptions.

Bug fixes

  • Fixed a bug where a DataFrame set operation (DataFrame.subtract, DataFrame.union, etc.) being called after another DataFrame set operation and DataFrame.select or DataFrame.with_column throws an exception.

  • Fixed a bug where chained sort statements are overwritten by the SQL simplifier.

Improvements

  • Simplified JOIN queries to use constant subquery aliases (SNOWPARK_LEFT, SNOWPARK_RIGHT) by default. Users can disable this at runtime with session.conf.set('use_constant_subquery_alias', False) to use randomly generated alias names instead.

  • Allowed specifying statement parameters in session.call().

  • Enabled the uploading of large pandas DataFrames in stored procedures by defaulting to a chunk size of 100,000 rows.

Version 1.2.0 (2023-03-02)

New features and updates

  • Added support for displaying source code as comments in the generated scripts when registering stored procedures. This is enabled by default, turn off by specifying source_code_display=False at registration.

  • Added a parameter if_not_exists when creating a UDF, UDTF or Stored Procedure from Snowpark Python to ignore creating the specified function or procedure if it already exists.

  • Accept integers when calling snowflake.snowpark.functions.get to extract value from array.

  • Added functions.reverse in functions to open access to Snowflake built-in function REVERSE.

  • Added parameter require_scoped_url in snowflake.snowflake.files.SnowflakeFile.open() (in Private Preview) to replace is_owner_file, which is marked for deprecation.

Bug fixes

  • Fixed a bug that overwrote paramstyle to qmark when creating a Snowpark session.

  • Fixed a bug where df.join(..., how="cross") fails with SnowparkJoinException: (1112): Unsupported using join type 'Cross'.

  • Fixed a bug where querying a DataFrame column created from chained function calls used a wrong column name.

Version 1.1.0 (2023-01-26)

New features and updates

  • Added asc, asc_nulls_first, asc_nulls_last, desc, desc_nulls_first, desc_nulls_last, date_part, and unix_timestamp in functions.

  • Added the property DataFrame.dtypes to return a list of column name and data type pairs.

  • Added the following aliases:

    • functions.expr() for functions.sql_expr().

    • functions.date_format() for functions.to_date().

    • functions.monotonically_increasing_id() for functions.seq8().

    • functions.from_unixtime() for functions.to_timestamp().

Bug fixes

Improvements

  • The session parameter PYTHON_SNOWPARK_USE_SQL_SIMPLIFIER will be True after Snowflake 7.3 is released. In snowpark-python, session.sql_simplifier_enabled reads the value of PYTHON_SNOWPARK_USE_SQL_SIMPLIFIER by default, meaning that the SQL simplifier is enabled by default after the Snowflake 7.3 release. To turn this off, set PYTHON_SNOWPARK_USE_SQL_SIMPLIFIER in Snowflake to False or run session.sql_simplifier_enabled = False from Snowpark. It is recommended to use the SQL simplifier because it helps to generate more concise SQL.