Snowpark Library for Python release notes for 2023¶
This article contains the release notes for the Snowpark Library for Python, including the following when applicable:
Behavior changes
New features
Customer-facing bug fixes
Snowflake uses semantic versioning for Snowpark Library for Python updates.
See Snowpark Developer Guide for Python for documentation.
Version 1.11.1 (2023-12-07)¶
Version 1.11.1 of the Snowpark library introduces some new features.
New features¶
Added the
conn_error
attribute toSnowflakeSQLException
, which stores the whole underlying exception fromsnowflake-connector-python
.Added support for
RelationalGroupedDataframe.pivot()
to accesspivot
in the following patternDataframe.group_by(...).pivot(...)
.Added the experimental feature, Local Testing Mode, which allows you to create and operate on Snowpark Python DataFrames locally without connecting to a Snowflake account. You can use the local testing framework to test your DataFrame operations locally, on your development machine or in a CI (continuous integration) pipeline, before deploying code changes to your account.
Added support for
arrays_to_object
new functions insnowflake.snowpark.functions
.Added support for the vector data type.
Dependency updates¶
Bumped the cloudpickle dependency to work with
cloudpickle==2.2.1
.Updated
snowflake-connector-python
to version3.4.0
.
Bug fixes¶
DataFrame column names quoting check now supports newline characters.
Fixed a bug where a DataFrame generated by
session.read.with_metadata
created an inconsistent table when doingdf.write.save_as_table
.
Version 1.10.0 (2023-11-03)¶
Version 1.10.0 of the Snowpark library introduces some new features.
New features¶
Added support for managing case sensitivity in
DataFrame.to_local_iterator()
.Added support for specifying vectorized UDTF’s input column names by using the optional parameter
input_names
inUDTFRegistration.register
,UDTFRegistration.register_file
, andfunctions.pandas_udtf
. By default,RelationalGroupedDataFrame.applyInPandas
will infer the column names from current DataFrame schema.Added
sql_error_code
andraw_message
attributes toSnowflakeSQLException
when it is caused by a SQL exception.
Bug fixes¶
Fixed a bug in
DataFrame.to_pandas()
where converting Snowpark DataFrames to Pandas DataFrames was losing precision on integers with more than 19 digits.Fixed a bug in
session.add_packages
where it could not handle a requirement specifier that contained a project name with an underscore and a version.Fixed a bug in
DataFrame.limit()
whenoffset
is used and the parentDataFrame
useslimit
. Now theoffset
won’t impact the parent DataFrame’slimit
.Fixed a bug in
DataFrame.write.save_as_table
where DataFrames created from the read API could not save data into Snowflake because of an invalid column name$1
.
Behavior changes¶
Changed the behavior of
date_format
:The
format
argument changed from optional to required.The returned result changed from a date object to a date-formatted string.
When a window function or a sequence-dependent data generator (
normal
,zipf
,uniform
,seq1
,seq2
,seq4
,seq8
) function is used, the sort and filter operation will no longer be flattened when generating the query.
Version 1.9.0 (2023-10-16)¶
Version 1.9.0 of the Snowpark library introduces some new features.
New features¶
Added support for the Python 3.11 runtime environment.
Support
PythonObjJSONEncoder
JSON-serializable objects forARRAY
andOBJECT
literals.
Dependency updates¶
Re-added the dependency of
typing-extensions
.
Bug fixes¶
Fixed a bug where imports from permanent stage locations were ignored for temporary stored procedures, UDTFs, UDFs, and UDAFs.
Revert back to using CTAS (CREATE TABLE AS SELECT) statement for
DataFrameWriter.save_as_table
which does not need insert permission for writing tables.
Version 1.8.0 (2023-09-14)¶
Version 1.8.0 of the Snowpark library introduces some new features.
New features¶
Added support for
VOLATILE
andIMMUTABLE
keywords when registering UDFs.Added support for specifying clustering keys when saving dataframes using
DataFrame.save_as_table
.Accept
Iterable
objects input forschema
when creating dataframes usingSession.create_dataframe
.Added the
DataFrame.session
property to return aSession
object.Added the
Session.session_id
property to return an integer that represents the session ID.Added the
Session.connection
property to return aSnowflakeConnection
object.Added support for creating a Snowpark session from a configuration file or environment variables.
Dependency updates¶
Updated
snowflake-connector-python
to 3.2.0.
Bug fixes¶
Fixed a bug where an automatic package upload would raise
ValueError
even when compatible package versions were added insession.add_packages
.Fixed a bug where table stored procedures were not registered correctly when using
register_from_file
.Fixed a bug where dataframe joins failed with
invalid_identifier
error.Fixed a bug where
DataFrame.copy
disabled SQL simplfier for the returned copy.Fixed a bug where
session.sql().select()
would fail if any parameters were specified tosession.sql()
.
Version 1.7.0 (2023-08-28)¶
Version 1.7.0 of the Snowpark library introduces some new features.
Behavior changes¶
When creating stored procedures, UDFs, UDTFs, and UDAFs with the parameter
is_permanent=False
, temporary objects are created even whenstage_name
is provided. The default value ofis_permanent
isFalse
, which is why if this value is not explicitly set toTrue
for permanent objects, users will notice a change in behavior.types.StructField
now enquotes column identifier by default.
New features¶
Added parameters
external_access_integrations
andsecrets
that can be used when creating a UDF, UDTF or stored procedure from Snowpark Python to allow integration with external access.Added support for these new functions in
snowflake.snowpark.functions
:array_flatten
andflatten
.Added support for
apply_in_pandas
insnowflake.snowpark.relational_grouped_dataframe
.Added support for replicating your local Python environment on Snowflake via
Session.replicate_local_environment
.
Bug fixes¶
Fixed a bug where
session.create_dataframe
fails to properly set nullable columns where nullability was affected by order or when data was given.Fixed a bug where
DataFrame.select
could not identify and alias columns when using table functions when output columns of the table function overlapped with columns in the DataFrame.
Version 1.6.1 (2023-08-02)¶
Behavior changes¶
DataFrameWriter.save_as_table
now respects nullable field of for schema provided by the user, or inferred schema based on data from user input.
New features¶
Added support for new functions in
snowflake.snowpark.functions
:array_sort
sort_array
array_min
array_max
explode_outer
Added support for pure Python packages specified via
Session.add_requirements
orSession.add_packages
. They are now usable in stored procedures and UDFs even if packages are not present on the Snowflake Anaconda channel.Added the Session parameter
custom_packages_upload_enabled
andcustom_packages_force_upload_enabled
to enable the support for pure Python packages feature mentioned above. Both parameters default toFalse
.Added support for specifying package requirements by passing a conda environment YAML file to
Session.add_requirements
.Added support for asynchronous execution of multi-query dataframes that contain binding variables.
Added support for renaming multiple columns in
DataFrame.rename
.Added support for Geometry datatypes.
Added support for params in
session.sql()
in stored procedures.Added support for user-defined aggregate functions (UDAFs). This feature is currently in private preview.
Added support for vectorized user-defined table functions (vectorized UDTFs). This feature is currently in public preview.
Added support for Snowflake Timestamp variants (i.e.,
TIMESTAMP_NTZ
,TIMESTAMP_LTZ
,TIMESTAMP_TZ
):Added TimestampTimezone as an argument in
TimestampType
constructor.Added type hints:
NTZ
,LTZ
,TZ
and Timestamp to annotate functions when registering UDFs.
Improvements¶
Removed redundant dependency typing-extensions.
DataFrame.cache_result
now creates a temp table of fully-qualified names under the current database and schema.
Bug fixes¶
Fixed a bug where type check happens on pandas before it is imported.
Fixed a bug when creating a UDF from
numpy.ufunc
.Fixed a bug where
DataFrame.union
was not generating the correctSelectable.schema_query
when SQL simplifier is enabled.
Dependency updates¶
Updated
snowflake-connector-python
to version 3.0.4.
Version 1.5.1 (2023-06-20)¶
New features and updates¶
Added support for the Python 3.10 runtime environment.
Version 1.5.0 (2023-06-13)¶
Behavior changes¶
Aggregation results, from functions such as
DataFrame.agg
andDataFrame.describe
, no longer strip away non-printing characters from column names.
New features and updates¶
Added support for the Python 3.9 runtime environment.
Added support for new functions in
snowflake.snowpark.functions
:array_generate_range
array_unique_agg
collect_set
sequence
Added support for registering and calling stored procedures with the
TABLE
return type.Added support for parameter length in
StringType()
to specify the maximum number of characters that can be stored by the column.Added the alias
functions.element_at()
forfunctions.get()
.Added the alias
Column.contains
forfunctions.contains
.Added the experimental feature
DataFrame.alias
.Added support for querying metadata columns from stage when creating
DataFrame
usingDataFrameReader
.Added support for
StructType.add
to append more fields to existingStructType
objects.Added support for parameter
execute_as
inStoredProcedureRegistration.register_from_file()
to specify stored procedure caller rights.
Bug fixes¶
Fixed a bug where the
Dataframe.join_table_function
did not run all of the necessary queries to set up the join table function when SQL simplifier was enabled.Fixed type hint declaration for custom types:
ColumnOrName
,ColumnOrLiteralStr
,ColumnOrSqlExpr
,LiteralType
andColumnOrLiteral
that were breakingmypy
checks.Fixed a bug where
DataFrameWriter.save_as_table
andDataFrame.copy_into_table
failed to parse fully qualified table names.
Version 1.4.0 (2023-04-24)¶
New features¶
Added support for
session.getOrCreate
.Added support for alias
Column.getField
.Added support for new functions in
snowflake.snowpark.functions
:date_add
anddate_sub
to make add and subtract operations easier.ddaydiff
dexplode
darray_distinct
dregexp_extract
dstruct
dformat_number
dbround
dsubstring_index
Added parameter
skip_upload_on_content_match
when creating UDFs, UDTFs, and stored procedures usingregister_from_file
to skip uploading files to a stage if the same version of the files are already on the stage.Added support for the
DataFrame.save_as_table
method to take table names that contain dots.Flattened generated SQL when
DataFrame.filter()
orDataFrame.order_by()
is followed by a projection statement (e.g.DataFrame.select()
,DataFrame.with_column()
).Added support for creating dynamic tables (in private preview) using
Dataframe.create_or_replace_dynamic_table
.Added an optional argument,
params
, insession.sql()
to support binding variables. Note that this argument is not supported in stored procedures yet.
Bug fixes¶
Fixed a bug in
strtok_to_array
where an exception was thrown when a delimiter was passed in.Fixed a bug in
session.add_import
where the module had the same namespace as other dependencies.
Version 1.3.0 (2023-03-28)¶
New features¶
Added support for the delimiters parameter in
functions.initcap()
.Added support for
functions.hash()
to accept a variable number of input expressions.Added API
Session.conf
for getting, setting or checking the mutability of any runtime configuration.Added support for managing case sensitivity in
Row
results fromDataFrame.collect
usingcase_sensitive
parameter.Added indexer support for
snowflake.snowpark.types.StructType
.Added a keyword argument
log_on_exception
toDataframe.collect
andDataframe.collect_no_wait
to optionally disable error logging for SQL exceptions.
Bug fixes¶
Fixed a bug where a DataFrame set operation (
DataFrame.subtract
,DataFrame.union
, etc.) being called after another DataFrame set operation andDataFrame.select
orDataFrame.with_column
throws an exception.Fixed a bug where chained sort statements are overwritten by the SQL simplifier.
Improvements¶
Simplified JOIN queries to use constant subquery aliases (
SNOWPARK_LEFT
,SNOWPARK_RIGHT
) by default. Users can disable this at runtime withsession.conf.set('use_constant_subquery_alias', False)
to use randomly generated alias names instead.Allowed specifying statement parameters in
session.call()
.Enabled the uploading of large pandas DataFrames in stored procedures by defaulting to a chunk size of 100,000 rows.
Version 1.2.0 (2023-03-02)¶
New features and updates¶
Added support for displaying source code as comments in the generated scripts when registering stored procedures. This is enabled by default, turn off by specifying
source_code_display=False
at registration.Added a parameter
if_not_exists
when creating a UDF, UDTF or Stored Procedure from Snowpark Python to ignore creating the specified function or procedure if it already exists.Accept integers when calling
snowflake.snowpark.functions.get
to extract value from array.Added
functions.reverse
in functions to open access to Snowflake built-in function REVERSE.Added parameter
require_scoped_url
insnowflake.snowflake.files.SnowflakeFile.open()
(in Private Preview) to replaceis_owner_file
, which is marked for deprecation.
Bug fixes¶
Fixed a bug that overwrote
paramstyle
toqmark
when creating a Snowpark session.Fixed a bug where
df.join(..., how="cross")
fails withSnowparkJoinException: (1112): Unsupported using join type 'Cross'
.Fixed a bug where querying a
DataFrame
column created from chained function calls used a wrong column name.
Version 1.1.0 (2023-01-26)¶
New features and updates¶
Added
asc
,asc_nulls_first
,asc_nulls_last
,desc
,desc_nulls_first
,desc_nulls_last
,date_part
, andunix_timestamp
in functions.Added the property
DataFrame.dtypes
to return a list of column name and data type pairs.Added the following aliases:
functions.expr() for functions.sql_expr()
.functions.date_format() for functions.to_date()
.functions.monotonically_increasing_id() for functions.seq8()
.functions.from_unixtime() for functions.to_timestamp()
.
Bug fixes¶
Fixed a bug in SQL simplifier that didn’t handle Column alias and join well in some cases. See https://github.com/snowflakedb/snowpark-python/issues/658 for details.
Fixed a bug in SQL simplifier that generated wrong column names for function calls, NaN and INF.
Improvements¶
The session parameter
PYTHON_SNOWPARK_USE_SQL_SIMPLIFIER
will beTrue
after Snowflake 7.3 is released. In snowpark-python,session.sql_simplifier_enabled
reads the value ofPYTHON_SNOWPARK_USE_SQL_SIMPLIFIER
by default, meaning that the SQL simplifier is enabled by default after the Snowflake 7.3 release. To turn this off, setPYTHON_SNOWPARK_USE_SQL_SIMPLIFIER
in Snowflake to False or runsession.sql_simplifier_enabled = False
from Snowpark. It is recommended to use the SQL simplifier because it helps to generate more concise SQL.