Snowpark Library for Python release notes for 2024¶
This article contains the release notes for the Snowpark Library for Python, including the following when applicable:
Behavior changes
New features
Customer-facing bug fixes
Snowflake uses semantic versioning for Snowpark Library for Python updates.
Version 1.20.0 (2024-07-17)¶
Version 1.20.0 of the Snowpark Library for Python introduces some new features.
New features¶
Added distributed tracing using open telemetry APIs for table stored procedure functions in
DataFrame
:_execute_and_get_query_id
Added support for the
arrays_zip
function.Improved performance for binary column expressions and
df._in
by avoiding unnecessary casts for numeric values. You can enable this optimization by settingsession.eliminate_numeric_sql_value_cast_enabled = True
.Improved error messages for
write_pandas
when the target table does not exist andauto_create_table=False
.Added open telemetry tracing on UDxF functions in Snowpark.
Added open telemetry tracing on stored procedure registration in Snowpark.
Added a new optional parameter called
format_json
to theSession.SessionBuilder.app_name
function that sets the app name in theSession.query_tag
in JSON format. By default, this parameter is set toFalse
.
Bug fixes¶
Fixed a bug where the SQL generated for
lag(x, 0)
was incorrect and failed with the error messageargument 1 to function LAG needs to be constant, found 'SYSTEM$NULL_TO_FIXED(null)'
.
Snowpark local testing updates¶
New features¶
Added support for the following APIs:
snowflake.snowpark.functions
random
Added new parameters to the
patch
function when registering a mocked function:distinct
allows an alternate function to be specified for when a SQL function should be distinct.pass_column_index
passes a named parameter,column_index
, to the mocked function that contains thepandas.Index
for the input data.pass_row_index
passes a named parameter,row_index
, to the mocked function that is the 0-indexed row number on which the function is currently operating.pass_input_data
passes a named parameter,input_data
, to the mocked function that contains the entire input dataframe for the current expression.Added support for the
column_order
parameter in theDataFrameWriter.save_as_table
method.
Bug fixes¶
Fixed a bug that caused
DecimalType
columns to be incorrectly truncated to integer precision when used inBinaryExpressions
.
Snowpark pandas API Updates¶
New features¶
Added new API support for the following:
DataFrames
DataFrame.nlargest
andDataFrame.nsmallest
DataFrame.assign
DataFrame.stack
DataFrame.pivot
DataFrame.to_csv
DataFrame.corr
DataFrame.corr
DataFrame.equals
DataFrame.reindex
DataFrame.at
andDataFrame.iat
Series
Series.nlargest
andSeries.nsmallest
Series.at
andSeries.iat
Series.dt.isocalendar
Series.equals
Series.reindex
Series.to_csv
Series.case_when
except when condition or replacement is callableseries.plot()
with data materialized the data to the local client
GroupBy
DataFrameGroupBy.all
andDataFrameGroupBy.any
DataFrameGroupBy
andSeriesGroupBy
aggregationsfirst
andlast
DataFrameGroupBy.get_group
SeriesGroupBy.all
andSeriesGroupBy.any
General
pd.pivot
read_excel
(Uses local pandas for processing)df.plot()
with data materialized the data to the local client
Extended existing APIs as follows:
Added support for
replace
andfrac > 1
inDataFrame.sample
andSeries.sample
.Added partial support for
Series.str.translate
where the values in thetable
are single-codepoint strings.Added support for
limit
parameter whenmethod
parameter is used infillna
.
Added documentation pages for
Index
and its APIs.
Bug fixes¶
Fixed an issue when using np.where and df.where when the scalar
other
is the literal 0.Fixed a bug regarding precision loss when converting to Snowpark pandas
DataFrame
orSeries
withdtype=np.uint64
.Fixed a bug where
values
is set toindex
whenindex
andcolumns
contain all columns in DataFrame duringpivot_table
.
Improvements¶
Added support for
Index.copy()
.Added support for Index APIs:
dtype
,values
,item()
,tolist()
,to_series()
andto_frame()
.Expand support for DataFrames with no rows in
pd.pivot_table
andDataFrame.pivot_table
.Added support for
inplace
parameter inDataFrame.sort_index
andSeries.sort_index
.
Version 1.19.0 (2024-06-25)¶
Version 1.19.0 of the Snowpark Library for Python introduces some new features.
New features¶
Added support for the
to_boolean
function.Added documentation pages for
Index
and its APIs.
Bug fixes¶
Fixed a bug where Python stored procedures with tables return type fails when run in a task.
Fixed a bug where
df.dropna
fails due toRecursionError: maximum recursion depth exceeded
when the DataFrame has more than 500 columns.Fixed a bug where
AsyncJob.result("no_result")
doesn’t wait for the query to finish execution.
Local testing updates¶
New features¶
Added support for the
strict
parameter when registering UDFs and Stored Procedures.
Bug fixes¶
Fixed a bug in
convert_timezone
that made setting thesource_timezone
parameter return an error.Fixed a bug where creating a DataFrame with empty data of type
DateType
raisesAttributeError
.Fixed a bug where table merge fails when an update clause exists but no update takes place.
Fixed a bug in the mock implementation of
to_char
that raisesIndexError
when an incoming column has a nonconsecutive row index.Fixed a bug in handling
CaseExpr
expressions that raisesIndexError
when an incoming column has a nonconsecutive row index.Fixed a bug in the implementation of
Column.like
that raisesIndexError
when an incoming column has a nonconsecutive row index.
Improvements¶
Added support for type coercion in the implementation of
DataFrame.replace
,DataFrame.dropna
, and the mock functioniff
.
Snowpark pandas API updates¶
New features¶
Added partial support for
DataFrame.pct_change
andSeries.pct_change
without thefreq
andlimit
parameters.Added support for
Series.str.get
.Added support for
Series.dt.dayofweek
,Series.dt.day_of_week
,Series.dt.dayofyear
, andSeries.dt.day_of_year
.Added support for
Series.str.__getitem__ (Series.str[...])
.Added support for
Series.str.lstrip
andSeries.str.rstrip
.Added support for
DataFrameGroupby.size
andSeriesGroupby.size
.Added support for
DataFrame.expanding
andSeries.expanding
for aggregationscount
,sum
,min
,max
,mean
,std
, andvar
withaxis=0
.Added support for
DataFrame.rolling
andSeries.rolling
for aggregation count withaxis=0
.Added support for
Series.str.match
.Added support for
DataFrame.resample
andSeries.resample
for aggregation size.
Bug fixes¶
Fixed a bug that causes output of
GroupBy.aggregate
columns to be ordered incorrectly.Fixed a bug where calling
DataFrame.describe
on a frame with duplicate columns of differingdtypes
could cause an error or incorrect results.Fixed a bug in
DataFrame.rolling
andSeries.rolling
sowindow=0
now throwsNotImplementedError
instead ofValueError
Improvements¶
Added support for named aggregations in
DataFrame.aggregate
andSeries.aggregate
withaxis=0
.pd.read_csv
reads using the native pandas CSV parser, then uploads data to Snowflake using parquet. This enables most of the parameters supported byread_csv
, including date parsing and numeric conversions. Uploading via parquet is roughly twice as fast as uploading via CSV.Initial work to support a
pd.Index
directly in Snowpark pandas. Support forpd.Index
as a first-class component of Snowpark pandas is under active development.Added a lazy index constructor and support for
len
,shape
,size
,empty
,to_pandas()
, andnames
. Fordf.index
, Snowpark pandas creates a lazy index object.For
df.columns
, Snowpark pandas supports a non-lazy version of anIndex
as the data is already stored locally.
Version 1.18.0 (2024-05-28)¶
Version 1.18.0 of the Snowpark library introduces some new features.
New features¶
Added the
DataFrame.cache_result
andSeries.cache_result
methods for users to persistDataFrame
andSeries
objects to a temporary table for the duration of a session to improve latency of subsequent operations.
Improvements¶
Added support for
DataFrame.pivot_table
with noindex
parameter and with themargins
parameter.Updated the signature of
DataFrame.shift
,Series.shift
,DataFrameGroupBy.shift
, andSeriesGroupBy.shift
to match pandas 2.2.1. Snowpark pandas does not yet support the newly-added suffix argument or sequence values of periods.Re-added support for
Series.str.split
.
Bug fixes¶
Fixed an issue with mixed columns for string methods (
Series.str.*
).
Local testing updates¶
New features¶
Added support for the following
DataFrameReader
read options to file formats CSV and JSON:PURGE
PATTERN
INFER_SCHEMA with value
False
ENCODING with value
UTF8
Added support for
DataFrame.analytics.moving_agg
andDataFrame.analytics.cumulative_agg_agg
.Added support for the
if_not_exists
parameter during UDF and stored procedure registration.
Bug fixes¶
Fixed a bug with processing time formats where the fractional second part was not handled properly.
Fixed a bug that caused function calls on
*
to fail.Fixed a bug that prevented the creation of
map
andstruct
type objects.Fixed a bug where the function
date_add
was unable to handle some numeric types.Fixed a bug where
TimestampType
casting resulted in incorrect data.Fixed a bug that caused
DecimalType
data to have incorrect precision in some cases.Fixed a bug where referencing a missing table or view raised an
IndexError
.Fixed a bug where the mocked function
to_timestamp_ntz
could not handleNone
data.Fixed a bug where mocked UDFs handled output data of
None
improperly.Fixed a bug where
DataFrame.with_column_renamed
ignored attributes from parentDataFrames
after join operations.Fixed a bug where the integer precision of large values was lost when converted to a pandas
DataFrame
.Fixed a bug where the schema of a
datetime
object was wrong when creating aDataFrame
from a pandasDataFrame
.Fixed a bug in the implementation of
Column.equal_nan
where null data was handled incorrectly.Fixed a bug where
DataFrame.drop
ignored attributes from parentDataFrames
after join operations.Fixed a bug in mocked function
date_part
where column type was set incorrectly.Fixed a bug where
DataFrameWriter.save_as_table
did not raise exceptions when inserting null data into non-nullable columns.Fixed a bug in the implementation of
DataFrameWriter.save_as_table
where:Append or truncate failed when incoming data had a different schema than the existing table.
Truncate failed when incoming data did not specify columns that are nullable.
Improvements¶
Removed the dependency check for
pyarrow
because it is not used.Improved the target type coverage of
Column.cast
, adding support for casting to boolean and all integral types.Aligned the error experience when calling UDFs and stored procedures.
Added appropriate error messages for the
is_permanent
andanonymous
options in UDFs and stored procedures registration to make it clearer that those features are not yet supported.File read operations with unsupported options and values now raise
NotImplementedError
instead of warnings and unclear error information.
Version 1.17.0 (2024-05-21)¶
Version 1.17.0 of the Snowpark library introduces some new features.
New features¶
Added support to add a comment on tables and views using the functions listed below:
DataFrameWriter.save_as_table
DataFrame.create_or_replace_view
DataFrame.create_or_replace_temp_view
DataFrame.create_or_replace_dynamic_table
Improvements¶
Improved error message to remind users to set
{"infer_schema": True}
when reading CSV file without specifying its schema.
Local testing updates¶
New features¶
Added support for
NumericType
andVariantType
data conversion in the mocked functionto_timestamp_ltz
,to_timestamp_ntz
,to_timestamp_tz
andto_timestamp
.Added support for
DecimalType
,BinaryType
,ArrayType
,MapType
,TimestampType
,DateType
andTimeType
data conversion in the mocked functionto_char
.Added support for the following APIs:
snowflake.snowpark.functions.to_varchar
snowflake.snowpark.DataFrame.pivot
snowflake.snowpark.Session.cancel_all
Introduced a new exception class
snowflake.snowpark.mock.exceptions.SnowparkLocalTestingException
.Added support for casting to
FloatType
.
Bug fixes¶
Fixed a bug that stored procedures and UDFs should not remove imports already in the
sys.path
during the clean-up step.Fixed a bug that when processing
datetime
format, the fractional second part is not handled properly.Fixed a bug where file operations on the Windows platform were unable to properly handle file separators in directory names.
Fixed a bug that on the Windows platform that, when reading a pandas dataframe, an
IntervalType
column with integer data can not be processed.Fixed a bug that prevented users from being able to select multiple columns with the same alias.
Fixed a bug where
Session.get_current_[schema|database|role|user|account|warehouse]
returns uppercased identifiers when identifiers are quoted.Fixed a bug that function
substr
andsubstring
can not handle a zero-basedstart_expr
.
Improvements¶
Standardized the error experience by raising
SnowparkLocalTestingException
in error cases, which is on par with theSnowparkSQLException
raised in non-local execution.Improved the error experience of the
Session.write_pandas
method so thatNotImplementError
will be raised when called.Aligned the error experience with reusing a closed session in non-local execution.
Version 1.16.0 (2024-05-08)¶
Version 1.16.0 of the Snowpark library introduces some new features.
New features¶
Added
snowflake.snowpark.Session.lineage.trace
to explore data lineage of Snowflake objects.Added support for registering stored procedures with packages given as Python modules.
Added support for structured type schema parsing.
Bug fixes¶
Fixed a bug where, when inferring a schema, single quotes were added to stage files that already had single quotes.
Local testing updates¶
New features¶
Added support for
StringType
,TimestampType
andVariantType
data conversion in the mocked functionto_date
.Added support for the following APIs:
snowflake.snowpark.functions
:get
concat
concat_ws
Bug fixes¶
Fixed a bug that caused
NaT
andNaN
values to not be recognized.Fixed a bug where, when inferring a schema, single quotes were added to stage files that already had single quotes.
Fixed a bug where
DataFrameReader.csv
was unable to handle quoted values containing a delimiter.Fixed a bug that when there is a
None
value in an arithmetic calculation, the output should remainNone
instead ofmath.nan
.Fixed a bug in function
sum
andcovar_pop
that when there is amath.nan
value in the data, the output should also bemath.nan
.Fixed a bug where stage operations can not handle directories.
Fixed a bug that
DataFrame.to_pandas
should take Snowflake numeric types with precision 38 asint64
.
Version 1.15.0 (2024-04-24)¶
Version 1.15.0 of the Snowpark library introduces some new features.
New features¶
Added
truncate
save mode inDataFrameWrite
to overwrite existing tables by truncating the underlying table instead of dropping it.Added telemetry to calculate query plan height and number of duplicate nodes during collect operations.
Added the functions below to unload data from a
DataFrame
into one or more files in a stage:DataFrame.write.json
DataFrame.write.csv
DataFrame.write.parquet
Added distributed tracing using open telemetry APIs for action functions in
DataFrame
andDataFrameWriter
:snowflake.snowpark.DataFrame
:collect
collect_nowait
to_pandas
count
show
snowflake.snowpark.DataFrameWriter
:save_as_table
Added support for
snow://
URLs tosnowflake.snowpark.Session.file.get
andsnowflake.snowpark.Session.file.get_stream
Added support to register stored procedures and UDFs with a
comment
.UDAF client support is ready for public preview. Please stay tuned for the Snowflake announcement of UDAF public preview.
Added support for dynamic pivot. This feature is currently in private preview.
Improvements¶
Improved the generated query performance for both compilation and execution by converting duplicate subqueries to Common Table Expressions (CTEs). It is still an experimental feature and it is not enabled by default. You can enable it by setting
session.cte_optimization_enabled
toTrue
.
Bug fixes¶
Fixed a bug where
statement_params
is not passed to query executions that register stored procedures and user defined functions.Fixed a bug causing
snowflake.snowpark.Session.file.get_stream
to fail for quoted stage locations.Fixed a bug that an internal type hint in
utils.py
might raiseAttributeError
when the underlying module can not be found.
Local testing updates¶
New features¶
Added support for registering UDFs and stored procedures.
Added support for the following APIs:
snowflake.snowpark.Session
:file.put
file.put_stream
file.get
file.get_stream
read.json
add_import
remove_import
get_imports
clear_imports
add_packages
add_requirements
clear_packages
remove_package
udf.register
udf.register_from_file
sproc.register
sproc.register_from_file
snowflake.snowpark.functions
current_database
current_session
date_trunc
object_construct
object_construct_keep_null
pow
sqrt
udf
sproc
Added support for
StringType
,TimestampType
andVariantType
data conversion in the mocked functionto_time
.
Bug fixes¶
Fixed a bug that null filled columns for constant functions.
Fixed
to_object
,to_array
andto_binary
to better handle null inputs.Fixed a bug that timestamp data comparison can not handle years beyond 2262.
Fixed a bug that
Session.builder.getOrCreate
should return the created mock session.
Version 1.14.0 (2024-03-20)¶
Version 1.14.0 of the Snowpark library introduces some new features.
New features¶
Added support for creating vectorized UDTFs with the
process
method.Added support for dataframe functions:
to_timestamp_ltz
to_timestamp_ntz
to_timestamp_tz
locate
Added support for ASOF JOIN type.
Added support for the following local testing APIs:
snowflake.snowpark.functions:
to_double
to_timestamp
to_timestamp_ltz
to_timestamp_ntz
to_timestamp_tz
greatest
least
convert_timezone
dateadd
date_part
snowflake.snowpark.Session:
get_current_account
get_current_warehouse
get_current_role
use_schema
use_warehouse
use_database
use_role
Improvements¶
Added telemetry to local testing.
Improved the error message of
DataFrameReader
to raiseFileNotFound
error when reading a path that does not exist or when there are no files under the path.
Bug fixes¶
Fixed a bug in
SnowflakePlanBuilder
wheresave_as_table
does not correctly filter columns whose names start with$
and is followed by a number.Fixed a bug where statement parameters might have no effect when resolving imports and packages.
Fixed bugs in local testing:
LEFT ANTI and LEFT SEMI joins drop rows with null values.
DataFrameReader.csv
incorrectly parses data when the optional parameterfield_optionally_enclosed_by
is specified.Column.regexp
only considers the first entry whenpattern
is aColumn
.Table.update
raisesKeyError
when updating null values in the rows.VARIANT columns raise errors at
DataFrame.collect
.count_distinct
does not work correctly when counting.Null values in integer columns raise
TypeError
.
Version 1.13.0 (2024-02-26)¶
Version 1.13.0 of the Snowpark library introduces some new features.
New Features¶
Added support for an optional
date_part
argument in functionlast_day
.SessionBuilder.app_name
will set thequery_tag
after the session is created.Added support for the following local testing functions:
current_timestamp
current_date
current_time
strip_null_value
upper
lower
length
initcap
Improvements¶
Added cleanup logic at interpreter shutdown to close all active sessions.
Bug fixes¶
Fixed a bug in
DataFrame.to_local_iterator
where the iterator could yield wrong results if another query is executed before the iterator finishes due to wrong isolation level.Fixed a bug that truncated table names in error messages while running a plan with local testing enabled.
Fixed a bug that
Session.range
returns empty result when the range is large.
Version 1.12.1 (2024-02-08)¶
Version 1.12.1 of the Snowpark library introduces some new features.
Improvements¶
Use
split_blocks=True
by default, duringto_pandas
conversion, for optimal memory allocation. This parameter is passed topyarrow.Table.to_pandas
, which enablesPyArrow
to split the memory allocation into smaller, more manageable blocks instead of allocating a single contiguous block. This results in better memory management when dealing with larger datasets.
Bug fixes¶
Fixed a bug in
DataFrame.to_pandas
that caused an error when evaluating on a Dataframe with anIntergerType
column with null values.
Version 1.12.0 (2024-01-29)¶
Version 1.12.0 of the Snowpark library introduces some new features.
Behavior Changes (API Compatible)¶
When parsing data types during a
to_pandas
operation, we rely on GS precision value to fix precision issues for large integer values. This may affect users where a column that was earlier returned asint8
gets returned asint64
. Users can fix this by explicitly specifying precision values for their return column.Aligned behavior for
Session.call
in case of table stored procedures where runningSession.call
would not trigger a stored procedure unless acollect()
operation was performed.StoredProcedureRegistration
now automatically addssnowflake-snowpark-python
as a package dependency on the client’s local version of the library. An error is thrown if the server cannot support that version.
New features¶
Exposed
statement_params
inStoredProcedure.__call__
.Added two optional arguments to
Session.add_import
:chunk_size
: The number of bytes to hash per chunk of the uploaded files.whole_file_hash
: By default only the first chunk of the uploaded import is hashed to save time. When this is set to True each uploaded file is fully hashed instead.
Added parameters
external_access_integrations
andsecrets
when creating a UDAF from Snowpark Python to allow integration with external access.Added a new method
Session.append_query_tag
, which allows an additional tag to be added to the current query tag by appending it as a comma separated value.Added a new method
Session.update_query_tag
, which allows updates to a JSON encoded dictionary query tag.SessionBuilder.getOrCreate
will now attempt to replace the singleton it returns when token expiration has been detected.Added the following functions in
snowflake.snowpark.functions
:array_except
create_map
sign
/signum
Added the following functions to
DataFrame.analytics
:Added the
moving_agg
function inDataFrame.analytics
to enable moving aggregations like sums and averages with multiple window sizes.Added the
cummulative_agg
function inDataFrame.analytics
to enable moving aggregations like sums and averages with multiple window sizes.
Bug fixes¶
Fixed a bug in
DataFrame.na.fill
that caused Boolean values to erroneously override integer values.Fixed a bug in
Session.create_dataframe
where the Snowpark DataFrames created using pandas DataFrames were not inferring the type for timestamp columns correctly. The behavior is as follows:Earlier timestamp columns without a timezone would be converted to nanosecond epochs and inferred as
LongType()
, but will now be correctly maintained as timestamp values and be inferred asTimestampType(TimestampTimeZone.NTZ)
.Earlier timestamp columns with a timezone would be inferred as
TimestampType(TimestampTimeZone.NTZ)
and loose timezone information but will now be correctly inferred asTimestampType(TimestampTimeZone.LTZ)
and timezone information is retained correctly.Set session parameter
PYTHON_SNOWPARK_USE_LOGICAL_TYPE_FOR_CREATE_DATAFRAME
to revert back to old behavior. Snowflake recommends that you update your code to align with correct behavior because the parameter will be removed in the future.
Fixed a bug that
DataFrame.to_pandas
gets decimal type when scale is not 0, and creates an object dtype inpandas
. Instead, we cast the value to a float64 type.Fixed bugs that wrongly flattened the generated SQL when one of the following happens:
DataFrame.filter()
is called afterDataFrame.sort().limit()
.DataFrame.sort()
orfilter()
is called on a DataFrame that already has a window function or sequence-dependent data generator column. For instance,df.select("a", seq1().alias("b")).select("a", "b").sort("a")
won’t flatten the sort clause anymore.A window or sequence-dependent data generator column is used after
DataFrame.limit()
. For instance,df.limit(10).select(row_number().over())
won’t flatten the limit and select in the generated SQL.
Fixed a bug where aliasing a DataFrame column raised an error when the DataFame was copied from another DataFrame with an aliased column. For instance,
df = df.select(col("a").alias("b")) df = copy(df) df.select(col("b").alias("c")) # Threw an error. Now it's fixed.
Fixed a bug in
Session.create_dataframe
that the non-nullable field in a schema is not respected for Boolean type. Note that this fix is only effective when the user has the privilege to create a temp table.Fixed a bug in SQL simplifier where non-select statements in
session.sql
dropped a SQL query when used withlimit()
.Fixed a bug that raised an exception when session parameter
ERROR_ON_NONDETERMINISTIC_UPDATE
is true.