Snowpark Library for Python release notes for 2024¶
This article contains the release notes for the Snowpark Library for Python, including the following when applicable:
Behavior changes
New features
Customer-facing bug fixes
Snowflake uses semantic versioning for Snowpark Library for Python updates.
Version 1.16.0 (2024-05-08)¶
Version 1.16.0 of the Snowpark library introduces some new features.
New features¶
Added
snowflake.snowpark.Session.lineage.trace
to explore data lineage of Snowflake objects.Added support for registering stored procedures with packages given as Python modules.
Added support for structured type schema parsing.
Bug fixes¶
Fixed a bug where, when inferring a schema, single quotes were added to stage files that already had single quotes.
Local testing updates¶
New features¶
Added support for
StringType
,TimestampType
andVariantType
data conversion in the mocked functionto_date
.Added support for the following APIs:
snowflake.snowpark.functions
:get
concat
concat_ws
Bug fixes¶
Fixed a bug that caused
NaT
andNaN
values to not be recognized.Fixed a bug where, when inferring a schema, single quotes were added to stage files that already had single quotes.
Fixed a bug where
DataFrameReader.csv
was unable to handle quoted values containing a delimiter.Fixed a bug that when there is a
None
value in an arithmetic calculation, the output should remainNone
instead ofmath.nan
.Fixed a bug in function
sum
andcovar_pop
that when there is amath.nan
value in the data, the output should also bemath.nan
.Fixed a bug where stage operations can not handle directories.
Fixed a bug that
DataFrame.to_pandas
should take Snowflake numeric types with precision 38 asint64
.
Version 1.15.0 (2024-04-24)¶
Version 1.15.0 of the Snowpark library introduces some new features.
New features¶
Added
truncate
save mode inDataFrameWrite
to overwrite existing tables by truncating the underlying table instead of dropping it.Added telemetry to calculate query plan height and number of duplicate nodes during collect operations.
Added the functions below to unload data from a
DataFrame
into one or more files in a stage:DataFrame.write.json
DataFrame.write.csv
DataFrame.write.parquet
Added distributed tracing using open telemetry APIs for action functions in
DataFrame
andDataFrameWriter
:snowflake.snowpark.DataFrame
:collect
collect_nowait
to_pandas
count
show
snowflake.snowpark.DataFrameWriter
:save_as_table
Added support for
snow://
URLs tosnowflake.snowpark.Session.file.get
andsnowflake.snowpark.Session.file.get_stream
Added support to register stored procedures and UDFs with a
comment
.UDAF client support is ready for public preview. Please stay tuned for the Snowflake announcement of UDAF public preview.
Added support for dynamic pivot. This feature is currently in private preview.
Improvements¶
Improved the generated query performance for both compilation and execution by converting duplicate subqueries to Common Table Expressions (CTEs). It is still an experimental feature and it is not enabled by default. You can enable it by setting
session.cte_optimization_enabled
toTrue
.
Bug fixes¶
Fixed a bug where
statement_params
is not passed to query executions that register stored procedures and user defined functions.Fixed a bug causing
snowflake.snowpark.Session.file.get_stream
to fail for quoted stage locations.Fixed a bug that an internal type hint in
utils.py
might raiseAttributeError
when the underlying module can not be found.
Local testing updates¶
New features¶
Added support for registering UDFs and stored procedures.
Added support for the following APIs:
snowflake.snowpark.Session
:file.put
file.put_stream
file.get
file.get_stream
read.json
add_import
remove_import
get_imports
clear_imports
add_packages
add_requirements
clear_packages
remove_package
udf.register
udf.register_from_file
sproc.register
sproc.register_from_file
snowflake.snowpark.functions
current_database
current_session
date_trunc
object_construct
object_construct_keep_null
pow
sqrt
udf
sproc
Added support for
StringType
,TimestampType
andVariantType
data conversion in the mocked functionto_time
.
Bug fixes¶
Fixed a bug that null filled columns for constant functions.
Fixed
to_object
,to_array
andto_binary
to better handle null inputs.Fixed a bug that timestamp data comparison can not handle years beyond 2262.
Fixed a bug that
Session.builder.getOrCreate
should return the created mock session.
Version 1.14.0 (2024-03-20)¶
Version 1.14.0 of the Snowpark library introduces some new features.
New features¶
Added support for creating vectorized UDTFs with the
process
method.Added support for dataframe functions:
to_timestamp_ltz
to_timestamp_ntz
to_timestamp_tz
locate
Added support for ASOF JOIN type.
Added support for the following local testing APIs:
snowflake.snowpark.functions:
to_double
to_timestamp
to_timestamp_ltz
to_timestamp_ntz
to_timestamp_tz
greatest
least
convert_timezone
dateadd
date_part
snowflake.snowpark.Session:
get_current_account
get_current_warehouse
get_current_role
use_schema
use_warehouse
use_database
use_role
Improvements¶
Added telemetry to local testing.
Improved the error message of
DataFrameReader
to raiseFileNotFound
error when reading a path that does not exist or when there are no files under the path.
Bug fixes¶
Fixed a bug in
SnowflakePlanBuilder
wheresave_as_table
does not correctly filter columns whose names start with$
and is followed by a number.Fixed a bug where statement parameters might have no effect when resolving imports and packages.
Fixed bugs in local testing:
LEFT ANTI and LEFT SEMI joins drop rows with null values.
DataFrameReader.csv
incorrectly parses data when the optional parameterfield_optionally_enclosed_by
is specified.Column.regexp
only considers the first entry whenpattern
is aColumn
.Table.update
raisesKeyError
when updating null values in the rows.VARIANT columns raise errors at
DataFrame.collect
.count_distinct
does not work correctly when counting.Null values in integer columns raise
TypeError
.
Version 1.13.0 (2024-02-26)¶
Version 1.13.0 of the Snowpark library introduces some new features.
New Features¶
Added support for an optional
date_part
argument in functionlast_day
.SessionBuilder.app_name
will set thequery_tag
after the session is created.Added support for the following local testing functions:
current_timestamp
current_date
current_time
strip_null_value
upper
lower
length
initcap
Improvements¶
Added cleanup logic at interpreter shutdown to close all active sessions.
Bug fixes¶
Fixed a bug in
DataFrame.to_local_iterator
where the iterator could yield wrong results if another query is executed before the iterator finishes due to wrong isolation level.Fixed a bug that truncated table names in error messages while running a plan with local testing enabled.
Fixed a bug that
Session.range
returns empty result when the range is large.
Version 1.12.1 (2024-02-08)¶
Version 1.12.1 of the Snowpark library introduces some new features.
Improvements¶
Use
split_blocks=True
by default, duringto_pandas
conversion, for optimal memory allocation. This parameter is passed topyarrow.Table.to_pandas
, which enablesPyArrow
to split the memory allocation into smaller, more manageable blocks instead of allocating a single contiguous block. This results in better memory management when dealing with larger datasets.
Bug fixes¶
Fixed a bug in
DataFrame.to_pandas
that caused an error when evaluating on a Dataframe with anIntergerType
column with null values.
Version 1.12.0 (2024-01-29)¶
Version 1.12.0 of the Snowpark library introduces some new features.
Behavior Changes (API Compatible)¶
When parsing data types during a
to_pandas
operation, we rely on GS precision value to fix precision issues for large integer values. This may affect users where a column that was earlier returned asint8
gets returned asint64
. Users can fix this by explicitly specifying precision values for their return column.Aligned behavior for
Session.call
in case of table stored procedures where runningSession.call
would not trigger a stored procedure unless acollect()
operation was performed.StoredProcedureRegistration
now automatically addssnowflake-snowpark-python
as a package dependency on the client’s local version of the library. An error is thrown if the server cannot support that version.
New features¶
Exposed
statement_params
inStoredProcedure.__call__
.Added two optional arguments to
Session.add_import
:chunk_size
: The number of bytes to hash per chunk of the uploaded files.whole_file_hash
: By default only the first chunk of the uploaded import is hashed to save time. When this is set to True each uploaded file is fully hashed instead.
Added parameters
external_access_integrations
andsecrets
when creating a UDAF from Snowpark Python to allow integration with external access.Added a new method
Session.append_query_tag
, which allows an additional tag to be added to the current query tag by appending it as a comma separated value.Added a new method
Session.update_query_tag
, which allows updates to a JSON encoded dictionary query tag.SessionBuilder.getOrCreate
will now attempt to replace the singleton it returns when token expiration has been detected.Added the following functions in
snowflake.snowpark.functions
:array_except
create_map
sign
/signum
Added the following functions to
DataFrame.analytics
:Added the
moving_agg
function inDataFrame.analytics
to enable moving aggregations like sums and averages with multiple window sizes.Added the
cummulative_agg
function inDataFrame.analytics
to enable moving aggregations like sums and averages with multiple window sizes.
Bug fixes¶
Fixed a bug in
DataFrame.na.fill
that caused Boolean values to erroneously override integer values.Fixed a bug in
Session.create_dataframe
where the Snowpark DataFrames created using pandas DataFrames were not inferring the type for timestamp columns correctly. The behavior is as follows:Earlier timestamp columns without a timezone would be converted to nanosecond epochs and inferred as
LongType()
, but will now be correctly maintained as timestamp values and be inferred asTimestampType(TimestampTimeZone.NTZ)
.Earlier timestamp columns with a timezone would be inferred as
TimestampType(TimestampTimeZone.NTZ)
and loose timezone information but will now be correctly inferred asTimestampType(TimestampTimeZone.LTZ)
and timezone information is retained correctly.Set session parameter
PYTHON_SNOWPARK_USE_LOGICAL_TYPE_FOR_CREATE_DATAFRAME
to revert back to old behavior. Snowflake recommends that you update your code to align with correct behavior because the parameter will be removed in the future.
Fixed a bug that
DataFrame.to_pandas
gets decimal type when scale is not 0, and creates an object dtype inpandas
. Instead, we cast the value to a float64 type.Fixed bugs that wrongly flattened the generated SQL when one of the following happens:
DataFrame.filter()
is called afterDataFrame.sort().limit()
.DataFrame.sort()
orfilter()
is called on a DataFrame that already has a window function or sequence-dependent data generator column. For instance,df.select("a", seq1().alias("b")).select("a", "b").sort("a")
won’t flatten the sort clause anymore.A window or sequence-dependent data generator column is used after
DataFrame.limit()
. For instance,df.limit(10).select(row_number().over())
won’t flatten the limit and select in the generated SQL.
Fixed a bug where aliasing a DataFrame column raised an error when the DataFame was copied from another DataFrame with an aliased column. For instance,
df = df.select(col("a").alias("b")) df = copy(df) df.select(col("b").alias("c")) # Threw an error. Now it's fixed.
Fixed a bug in
Session.create_dataframe
that the non-nullable field in a schema is not respected for Boolean type. Note that this fix is only effective when the user has the privilege to create a temp table.Fixed a bug in SQL simplifier where non-select statements in
session.sql
dropped a SQL query when used withlimit()
.Fixed a bug that raised an exception when session parameter
ERROR_ON_NONDETERMINISTIC_UPDATE
is true.