Changes to the Snowpark Python API¶
This topic provides information about release versions.
Version 1.4.0 (2023-04-24)¶
Version 1.4.0 of the Snowpark library introduces some new features.
New Features¶
Added support for
session.getOrCreate
.Added support for alias
Column.getField
.Added support for new functions in
snowflake.snowpark.functions
:date_add
anddate_sub
to make add and subtract operations easier.daydiff
explode
array_distinct
regexp_extract
struct
format_number
bround
substring_index
Added parameter
skip_upload_on_content_match
when creating UDFs, UDTFs, and stored procedures usingregister_from_file
to skip uploading files to a stage if the same version of the files are already on the stage.Added support for the
DataFrame.save_as_table
method to take table names that contain dots.Flattened generated SQL when
DataFrame.filter()
orDataFrame.order_by()
is followed by a projection statement (e.g.DataFrame.select()
,DataFrame.with_column()
).Added support for creating dynamic tables (in private preview) using
Dataframe.create_or_replace_dynamic_table
.Added an optional argument,
params
, insession.sql()
to support binding variables. Note that this argument is not supported in stored procedures yet.
Bug Fixes¶
Fixed a bug in
strtok_to_array
where an exception was thrown when a delimiter was passed in.Fixed a bug in
session.add_import
where the module had the same namespace as other dependencies.
Version 1.3.0 (2023-03-28)¶
Version 1.3.0 of the Snowpark library introduces some new features.
New Features¶
Added support for the
delimiters
parameter infunctions.initcap()
.Added support for
functions.hash()
to accept a variable number of input expressions.Added API
Session.conf
for getting, setting or checking the mutability of any runtime configuration.Added support for managing case sensitivity in
Row
results fromDataFrame.collect
usingcase_sensitive
parameter.Added indexer support for
snowflake.snowpark.types.StructType
.Added a keyword argument
log_on_exception
toDataframe.collect
andDataframe.collect_no_wait
to optionally disable error logging for SQL exceptions.
Bug Fixes¶
Fixed a bug where a DataFrame set operation(
DataFrame.subtract
,DataFrame.union
, etc.) being called after another DataFrame set operation andDataFrame.select
orDataFrame.with_column
throws an exception.Fixed a bug where chained sort statements are overwritten by the SQL simplifier.
Improvements¶
Simplified JOIN queries to use constant subquery aliases (
SNOWPARK_LEFT
,SNOWPARK_RIGHT
) by default. Users can disable this at runtime withsession.conf.set('use_constant_subquery_alias', False)
to use randomly generated alias names instead.Allowed specifying statement parameters in
session.call()
.Enabled the uploading of large pandas DataFrames in stored procedures by defaulting to a chunk size of 100,000 rows.
Version 1.2.0¶
Version 1.2.0 of the Snowpark library introduces some new features.
New Features¶
Added support for displaying source code as comments in the generated scripts when registering stored procedures. This is enabled by default, turn off by specifying
source_code_display=False
at registration.Added a parameter
if_not_exists
when creating a UDF, UDTF or Stored Procedure from Snowpark Python to ignore creating the specified function or procedure if it already exists.Accept integers when calling
snowflake.snowpark.functions.get
to extract value from array.Added
functions.reverse
in functions to open access to Snowflake built-in function REVERSE.Added parameter
require_scoped_url
insnowflake.snowflake.files.SnowflakeFile.open()
(in Private Preview) to replaceis_owner_file
, which is marked for deprecation.
Bug Fixes¶
Fixed a bug that overwrote
paramstyle
toqmark
when creating a Snowpark session.Fixed a bug where
df.join(..., how="cross")
fails withSnowparkJoinException: (1112): Unsupported using join type 'Cross'
.Fixed a bug where querying a
DataFrame
column created from chained function calls used a wrong column name.
Version 1.1.0¶
Version 1.1.0 of the Snowpark library introduces some new features.
New Features¶
Added
asc
,asc_nulls_first
,asc_nulls_last
,desc
,desc_nulls_first
,desc_nulls_last
,date_part
, andunix_timestamp
in functions.Added the property
DataFrame.dtypes
to return a list of column name and data type pairs.Added the following aliases:
functions.expr()
forfunctions.sql_expr()
.functions.date_format()
forfunctions.to_date()
.functions.monotonically_increasing_id()
forfunctions.seq8()
.functions.from_unixtime()
forfunctions.to_timestamp()
.
Bug Fixes¶
Fixed a bug in SQL simplifier that didn’t handle Column alias and join well in some cases. See https://github.com/snowflakedb/snowpark-python/issues/658 for details.
Fixed a bug in SQL simplifier that generated wrong column names for function calls,
NaN
andINF
.
Improvements¶
The session parameter
PYTHON_SNOWPARK_USE_SQL_SIMPLIFIER
will beTrue
after Snowflake 7.3 is released. In snowpark-python,session.sql_simplifier_enabled
reads the value ofPYTHON_SNOWPARK_USE_SQL_SIMPLIFIER
by default, meaning that the SQL simplfier is enabled by default after the Snowflake 7.3 release. To turn this off, setPYTHON_SNOWPARK_USE_SQL_SIMPLIFIER
in Snowflake toFalse
or runsession.sql_simplifier_enabled = False
from Snowpark. It is recommended to use the SQL simplifier because it helps to generate more concise SQL.
Version 1.0.0¶
Version 1.0.0 of the Snowpark library introduces some new features.
Version 0.12.0¶
Version 0.12.0 of the Snowpark library introduces some new features and improvements.
New Features¶
Added new APIs for async job:
Session.create_async_job()
to create anAsyncJob
instance from a query id.AsyncJob.result()
now accepts the argumentresult_type
to return the results in different formats.AsyncJob.to_df()
returns aDataFrame
built from the result of this asynchronous job.AsyncJob.query()
returns the SQL text of the executed query.
DataFrame.agg()
andRelationalGroupedDataFrame.agg()
now accept variable-length arguments.Added parameters
lsuffix
andrsuffix
toDataFrame.join()
andDataFrame.cross_join()
to conveniently rename overlapping columns.Added
Table.drop_table()
so you can drop the temp table after callingDataFrame.cache_result()
.Table
is also a context manager, so you can use thewith
statement to drop the cache temp table after use.Added
Session.use_secondary_roles()
.Added functions
first_value()
andlast_value()
. (contributed by @chasleslr)Added
on
as an alias forusing_columns
andhow
as an alias forjoin_type
inDataFrame.join()
.
Bug Fixes¶
Fixed a bug in
Session.create_dataframe()
that raised an error whenschema
names had special characters.Fixed a bug in which options set in
Session.read.option()
were not passed toDataFrame.copy_into_table()
as default values.Fixed a bug in which
DataFrame.copy_into_table()
raised an error when a copy option had single quotes in the value.
Version 0.11.0¶
Version 0.11.0 of the Snowpark library introduces some new features and improvements.
Behavior Changes¶
Session.add_packages()
now raises aValueError
when the version of a package cannot be found in Snowflake Anaconda channel. Previously,Session.add_packages()
succeeded and aSnowparkSQLException
exception was raised later in the UDF or stored procedure registration step.
New Features¶
Added method
FileOperation.get_stream()
to support downloading stage files as a stream.Added support in
functions.ntiles()
to accept anint
argument.Added the following aliases:
functions.call_function()
forfunctions.call_builtin()
.functions.function()
forfunctions.builtin()
.DataFrame.order_by()
forDataFrame.sort()
DataFrame.orderBy()
forDataFrame.sort()
Improved
DataFrame.cache_result()
to return a more accurateTable
class instead of aDataFrame
class.Added support to allow
session
as the first argument when callingStoredProcedure
.
Improvements¶
Improved nested query generation by flattening queries when applicable. This improvement can be enabled by setting
Session.sql_simplifier_enabled = True
.DataFrame.select()
,DataFrame.with_column()
,DataFrame.drop()
and other select-related APIs have more flattened SQL now.DataFrame.union()
,DataFrame.union_all()
,DataFrame.except_()
,DataFrame.intersect()
,DataFrame.union_by_name()
have flattened SQL generated when multiple set operators are chained.Improved type annotations for async job APIs.
Bug Fixes¶
Fixed a bug in which
Table.update()
,Table.delete()
,Table.merge()
tried to reference a temp table that did not exist.
Version 0.10.0¶
Version 0.10.0 of the Snowpark library introduces some new features and improvements.
New Features¶
Added experimental APIs for evaluating Snowpark dataframes with asynchronous queries:
Added keyword argument
block
to the following action APIs on Snowpark dataframes (which execute queries) to allow asynchronous evaluations:DataFrame.collect()
,DataFrame.to_local_iterator()
,DataFrame.to_pandas()
,DataFrame.to_pandas_batches()
,DataFrame.count()
,DataFrame.first()
,DataFrameWriter.save_as_table()
,DataFrameWriter.copy_into_location()
,Table.delete()
,Table.update()
,Table.merge()
.
Added method
DataFrame.collect_nowait()
to allow asynchronous evaluations.Added class
AsyncJob
to retrieve results from asynchronously executed queries and check their status.
Added support for
table_type
inSession.write_pandas()
. You can now choose from thesetable_type
options:temporary
,temp
, andtransient
.Added support for using Python structured data (
list
,tuple
anddict
) as literal values in Snowpark.Added keyword argument
execute_as
tofunctions.sproc()
andsession.sproc.register()
to allow registering a stored procedure as a caller or owner.Added support for specifying a pre-configured file format when reading files from a stage in Snowflake.
Improvements¶
Added support for displaying details of a Snowpark session.
Bug Fixes¶
Fixed a bug in which
DataFrame.copy_into_table()
andDataFrameWriter.save_as_table()
mistakenly created a new table if the table name was fully qualified, and the table already existed.
Deprecations¶
Deprecated keyword argument
create_temp_table
inSession.write_pandas()
.Deprecated invoking UDFs using arguments wrapped in a Python list or tuple. You can use variable-length arguments without a list or tuple.
Dependency updates¶
Updated
snowflake-connector-python
to 2.7.12.
Version 0.9.0¶
Version 0.9.0 of the Snowpark library introduces some new features and improvements.
New Features¶
Added support for displaying source code as comments in the generated scripts when registering UDFs. This feature is turned on by default. To turn it off, pass the new keyword argument
source_code_display
asFalse
when callingregister()
or@udf()
.Added support for calling table functions from
DataFrame.select()
,DataFrame.with_column()
, andDataFrame.with_columns()
which now take parameters of typetable_function.TableFunctionCall
for columns.Added keyword argument
overwrite
tosession.write_pandas()
to allow you to overwrite contents of a Snowflake table with that of a Pandas DataFrame.Added keyword argument
column_order
todf.write.save_as_table()
to specify the matching rules when inserting data into a table in append mode.Added method
FileOperation.put_stream()
to upload local files to a stage via a file stream.Added methods
TableFunctionCall.alias()
andTableFunctionCall.as_()
to allow aliasing the names of columns that come from the output of table function joins.Added function
get_active_session()
in modulesnowflake.snowpark.context
to get the current active Snowpark session.
Improvements¶
Improved the function
function.uniform()
to infer the types of inputsmax_
andmin_
and cast the limits toIntegerType
orFloatType
, respectively.
Bug Fixes¶
Fixed a bug in which batch insert should not raise an error when
statement_params
is not passed to the function.Fixed a bug in which column names should be quoted when
session.create_dataframe()
is called with dicts and a given schema.Fixed a bug in which creation of a table should be skipped if the table already exists and is in append mode when calling
df.write.save_as_table()
.Fixed a bug in which third-party packages with underscores cannot be added when registering UDFs.
Version 0.8.0¶
Version 0.8.0 of the Snowpark library introduces some new features and improvements.
New Features¶
Added keyword only argument
statement_params
to the following methods to allow for specifying statement level parameters:collect
,to_local_iterator
,to_pandas
,to_pandas_batches
,count
,copy_into_table
,show
,create_or_replace_view
,create_or_replace_temp_view
,first
,cache_result
andrandom_split
on classsnowflake.snowpark.Dateframe
.update
,delete
andmerge
on classsnowflake.snowpark.Table
.save_as_table
andcopy_into_location
on classsnowflake.snowpark.DataFrameWriter
.approx_quantile
,statement_params
,cov
andcrosstab
on classsnowflake.snowpark.DataFrameStatFunctions
.register
andregister_from_file
on classsnowflake.snowpark.udf.UDFRegistration
.register
andregister_from_file
on classsnowflake.snowpark.udtf.UDTFRegistration
.register
andregister_from_file
on classsnowflake.snowpark.stored_procedure.StoredProcedureRegistration
.udf
,udtf
andsproc
insnowflake.snowpark.functions
.
Added support for
Column
as an input argument tosession.call()
.Added support for
table_type
indf.write.save_as_table()
. You can now choose from thesetable_type
options:temporary
,temp
, andtransient
.
Improvements¶
Added validation of object name in
session.use_*
methods.Updated the query tag in SQL to escape it when it contains special characters.
Added a check to see if Anaconda terms are acknowledged when adding missing packages.
Bug Fixes¶
Fixed the limited length of the string column in
session.create_dataframe()
.Fixed a bug in which
session.create_dataframe()
mistakenly converted 0 andFalse
toNone
when the input data was only a list.Fixed a bug in which calling
session.create_dataframe()
using a large local dataset sometimes created a temp table twice.Aligned the definition of
function.trim()
with the SQL function definition.Fixed an issue where snowpark-python would hang when using the Python system-defined (built-in function)
sum
vs. the Snowparkfunction.sum()
.
Version 0.7.0¶
Version 0.7.0 of the Snowpark library introduces some new features and improvements.
New Features¶
Added support for user-defined table functions (UDTFs).
Use function
snowflake.snowpark.functions.udtf()
to register a UDTF, or use it as a decorator to register the UDTF.You can also use
Session.udtf.register()
to register a UDTF.Use
Session.udtf.register_from_file()
to register a UDTF from a Python file.
Updated APIs to query a table function, including both Snowflake built-in table functions and UDTFs.
Use function
snowflake.snowpark.functions.table_function()
to create a callable representing a table function and use it to call the table function in a query.Alternatively, use function
snowflake.snowpark.functions.call_table_function()
to call a table function.Added support for the
over
clause, which specifiespartition by
andorder by
when lateral joining a table function.Updated
Session.table_function()
andDataFrame.join_table_function()
to acceptTableFunctionCall
instances.
Breaking Changes¶
When creating a function with
functions.udf()
andfunctions.sproc()
, you can now specify an empty list for theimports
orpackages
argument to indicate that no import or package is used for this UDF or stored procedure. Previously, specifying an empty list meant that the function would use session-level imports or packages.Improved the
__repr__
implementation of data types intypes.py
. The unusedtype_name
property has been removed.Added a Snowpark-specific exception class for SQL errors. This replaces the previous
ProgrammingError
from the Python connector.
Improvements¶
Added a lock to a UDF or UDTF when it is called for the first time per thread.
Improved the error message for pickling errors that occurred during UDF creation.
Included the query ID when logging the failed query.
Bug Fixes¶
Fixed a bug in which non-integral data (such as timestamps) was occasionally converted to integer when calling
DataFrame.to_pandas()
.Fixed a bug in which
DataFrameReader.parquet()
failed to read a parquet file when its column contained spaces.Fixed a bug in which
DataFrame.copy_into_table()
failed when the dataframe is created by reading a file with inferred schemas.
Deprecations¶
Session.flatten()
andDataFrame.flatten()
.
Dependency Updates¶
Restricted the version of
cloudpickle
<=2.0.0
.
Version 0.6.0¶
Version 0.6.0 of the Snowpark library introduces some new features and improvements.
New Features¶
Added support for the vectorized UDFs via Python UDF Batch API. The Python UDF batch API enables defining Python functions that receive batches of input rows as Pandas DataFrames and return batches of results as Pandas arrays or Series. This can improve the performance of UDFs in Snowpark.
Added support for inferring the schema of a DataFrame by default when it is created by reading a Parquet, Avro, or ORC file in the stage.
Added functions
current_session()
,current_statement()
,current_user()
,current_version()
,current_warehouse()
,date_from_parts()
,date_trunc()
,dayname()
,dayofmonth()
,dayofweek()
,dayofyear()
,grouping()
,grouping_id()
,hour()
,last_day()
,minute()
,next_day()
,previous_day()
,second()
,month()
,monthname()
,quarter()
,year()
,current_database()
,current_role()
,current_schema()
,current_schemas()
,current_region()
,current_avaliable_roles()
,add_months()
,any_value()
,bitnot()
,bitshiftleft()
,bitshiftright()
,convert_timezone()
,uniform()
,strtok_to_array()
,sysdate()
,time_from_parts()
,timestamp_from_parts()
,timestamp_ltz_from_parts()
,timestamp_ntz_from_parts()
,timestamp_tz_from_parts()
,weekofyear()
,percentile_cont()
tosnowflake.snowflake.functions
.
Improvements¶
Added support for creating an empty
DataFrame
with a specific schema using theSession.create_dataframe()
method.Changed the logging level from
INFO
toDEBUG
for several logs (e.g., the executed query) when evaluating a dataframe.Improved the error message when failing to create a UDF due to pickle errors.
Removed the following APIs that were deprecated in 0.4.0:
DataFrame.groupByGroupingSets()
,DataFrame.naturalJoin()
,DataFrame.joinTableFunction
,DataFrame.withColumns()
,Session.getImports()
,Session.addImport()
,Session.removeImport()
,Session.clearImports()
,Session.getSessionStage()
,Session.getDefaultDatabase()
,Session.getDefaultSchema()
,Session.getCurrentDatabase()
,Session.getCurrentSchema()
,Session.getFullyQualifiedCurrentSchema()
.Added
typing-extension
as a new dependency with the version >=4.1.0
.
Bug Fixes¶
Removed pandas hard dependencies in the
Session.create_dataframe()
method.
Version 0.5.0¶
Version 0.5.0 of the Snowpark library introduces some new features and improvements.
New Features¶
Added stored procedures API.
Added
Session.sproc
property andsproc()
tosnowflake.snowpark.functions
, so you can register stored procedures.Added
Session.call
to call stored procedures by name.Added
UDFRegistration.register_from_file()
to allow registering UDFs from Python source files or zip files directly.Added
UDFRegistration.describe()
to describe a UDF.Added
DataFrame.random_split()
to provide a way to randomly split a dataframe.Added functions
md5()
,sha1()
,sha2()
,ascii()
,initcap()
,length()
,lower()
,lpad()
,ltrim()
,rpad()
,rtrim()
,repeat()
,soundex()
,regexp_count()
,replace()
,charindex()
,collate()
,collation()
,insert()
,left()
,right()
,endswith()
tosnowflake.snowpark.functions
.The
call_udf()
function now also accepts literal values.Provided a
distinct
keyword inarray_agg()
.
Bug Fixes¶
Fixed an issue that caused
DataFrame.to_pandas()
to have a string column ifColumn.cast(IntegerType())
was used.Fixed a bug in
DataFrame.describe()
when there is more than one string column.
Version 0.4.0¶
Version 0.4.0 of the Snowpark library introduces some new features and improvements.
New Features¶
You can now specify which Anaconda packages to use when defining UDFs.
Added
add_packages()
,get_packages()
,clear_packages()
, andremove_package()
to classSession
.Added
add_requirements()
toSession
so you can use a requirements file to specify which packages this session will use.Added parameter
packages
to functionsnowflake.snowpark.functions.udf()
and methodUserDefinedFunction.register()
to indicate UDF-level Anaconda package dependencies when creating a UDF.Added parameter
imports
tosnowflake.snowpark.functions.udf()
andUserDefinedFunction.register()
to specify UDF-level code imports.Added a parameter
session
to functionudf()
andUserDefinedFunction.register()
so you can specify which session to use to create a UDF if you have multiple sessions.Added types
Geography
andVariant
tosnowflake.snowpark.types
to be used as type hints for Geography and Variant data when defining a UDF.Added support for Geography geoJSON data.
Added
Table
, a subclass ofDataFrame
for table operations.Methods
update
anddelete
update and delete rows of a table in Snowflake.Method
merge
merges data from aDataFrame
to aTable
.Overrided method
DataFrame.sample()
with an additional parameterseed
, which works on tables but not on views and sub-queries.Added
DataFrame.to_local_iterator()
andDataFrame.to_pandas_batches()
to allow getting results from an iterator when the result set returned from the Snowflake database is too large.Added
DataFrame.cache_result()
for caching the operations performed on aDataFrame
in a temporary table. Subsequent operations on the originalDataFrame
have no effect on the cached resultDataFrame
.Added property
DataFrame.queries
to get SQL queries that will be executed to evaluate theDataFrame
.Added
Session.query_history()
as a context manager to track SQL queries executed on a session, including all SQL queries to evaluateDataFrames
created from a session. Both query ID and query text are recorded.You can now create a
Session
instance from an existing establishedsnowflake.connector.SnowflakeConnection
. Use parameterconnection
inSession.builder.configs()
.Added
use_database()
,use_schema()
,use_warehouse()
, anduse_role()
to classSession
to switch database/schema/warehouse/role after a session is created.Added
DataFrameWriter.copy_into_table()
to unload aDataFrame
to stage files.Added
DataFrame.unpivot()
.Added
Column.within_group()
for sorting the rows by columns with some aggregation functions.Added functions
listagg()
,mode()
,div0()
,acos()
,asin()
,atan()
,atan2()
,cos()
,cosh()
,sin()
,sinh()
,tan()
,tanh()
,degrees()
,radians()
,round()
,trunc()
, andfactorial()
tosnowflake.snowpark.functions
.Added an optional argument
ignore_nulls
in functionlead()
andlag()
.The
condition
parameter of functionwhen()
andiff()
now accepts SQL expressions.
Improvements¶
All function and method names have been renamed to use the snake case naming style, which is more Pythonic. For convenience, some camel case names are kept as aliases to the snake case APIs. It is recommended to use the snake case APIs.
Deprecated these methods on class
Session
and replaced them with their snake case equivalents:getImports()
,addImports()
,removeImport()
,clearImports()
,getSessionStage()
,getDefaultSchema()
,getDefaultSchema()
,getCurrentDatabase()
, andgetFullyQualifiedCurrentSchema()
.Deprecated these methods on class
DataFrame
and replaced them with their snake case equivalents:groupingByGroupingSets()
,naturalJoin()
,withColumns()
, andjoinTableFunction()
.Property
DataFrame.columns
is now consistent withDataFrame.schema.names
and the Snowflake database identifier requirements.Column.__bool__()
now raises aTypeError
. This will ban the use of logical operatorsand
,or
,not
onColumn
object. For example,col("a") > 1 and col("b") > 2
will raise aTypeError
. Use(col("a") > 1) & (col("b") > 2)
instead.Changed
PutResult
andGetResult
to subclassNamedTuple
.Fixed a bug which raised an error when the local path or stage location has a space or other special characters.
Changed
DataFrame.describe()
so that non-numeric and non-string columns are ignored instead of raising an exception.
Dependency Updates¶
Updated
snowflake-connector-python
to 2.7.4.
Version 0.3.0¶
Version 0.3.0 of the Snowpark library introduces some new features and improvements.
New Features¶
Added
Column.isin()
with an aliasColumn.in_()
.Added
Column.try_cast()
, which is a special version ofcast()
. It tries to cast a string expression to other types and returnsnull
if the cast is not possible.Added
Column.startswith()
andColumn.substr()
to process string columns.Column.cast()
now also accepts astr
value to indicate the cast type in addition to aDataType
instance.Added
DataFrame.describe()
to summarize the stats of aDataFrame
.Added
DataFrame.explain()
to print the query plan of aDataFrame
.DataFrame.filter()
andDataFrame.select_expr()
now accept a SQL expression.Added a new
bool
parameter calledcreate_temp_table
to methodsDataFrame.saveAsTable()
andSession.write_pandas()
to optionally create a temp table.Added
DataFrame.minus()
andDataFrame.subtract()
as aliases toDataFrame.except_()
.Added
regexp_replace()
,concat()
,concat_ws()
,to_char()
,current_timestamp()
,current_date()
,current_time()
,months_between()
,cast()
,try_cast()
,greatest()
,least()
, andhash()
to thesnowflake.snowpark.functions
module.
Bug Fixes¶
Fixed an issue where
Session.createDataFrame(pandas_df)
andSession.write_pandas(pandas_df)
raised an exception when the Pandas DataFrame had spaces in the column name.Fixed an issue where
DataFrame.copy_into_table()
sometimes erroneously printed an error level log entry.Fixed an API documentation issue where some DataFrame APIs were missing from the documentation.
Dependency Updates¶
Updated
snowflake-connector-python
to 2.7.2, which upgrades thepyarrow
dependency to 6.0.x. Refer to the Python connector 2.7.2 release notes for more information.
Version 0.2.0¶
Version 0.2.0 of the Snowpark library introduces some new features and improvements.
New Features¶
Added the
createDataFrame()
method for creating a DataFrame from a Pandas DataFrame.Added the
write_pandas()
method for writing a Pandas DataFrame to a table in Snowflake and getting a Snowpark DataFrame object back.Added new classes and methods for calling window functions.
Added the new functions
cume_dist()
, to find the cumulative distribution of a value with regard to other values within a window partition, androw_number()
, which returns a unique row number for each row within a window partition.Added functions for computing statistics for DataFrames in the
DataFrameStatFunctions
class.Added functions for handling missing values in a DataFrame in the
DataFrameNaFunctions
class.Added new methods:
rollup()
,cube()
, andpivot()
to the DataFrame class.Added the
GroupingSets
class, which you can use with the DataFramegroupByGroupingSets
method to perform aSQL GROUP BY GROUPING SETS
.Added the new
FileOperation(session)
class that you can use to upload and download files to and from a stage.Added the
copy_into_table()
method for loading data from files in a stage into a table.In CASE expressions, the functions
when
andotherwise
now accept Python types in addition toColumn
objects.When you register a UDF you can now optionally set the
replace
parameter toTrue
to overwrite an existing UDF with the same name.
Improvements¶
UDFs are now compressed before they are uploaded to the server. This makes them about 10 times smaller, which can help when you are using large ML model files.
When the size of a UDF is less than 8196 bytes, it will be uploaded as in-line code instead of uploaded to a stage.
Bug Fixes¶
Fixed an issue where the statement
df.select(when(col("a") == 1, 4).otherwise(col("a"))), [Row(4), Row(2), Row(3)]
raised an exception.Fixed an issue where
df.toPandas()
raised an exception when a DataFrame was created from large local data.