Snowpark Library for Python release notes for 2024

This article contains the release notes for the Snowpark Library for Python, including the following when applicable:

  • Behavior changes

  • New features

  • Customer-facing bug fixes

Snowflake uses semantic versioning for Snowpark Library for Python updates.

See Snowpark Developer Guide for Python for documentation.

Version 1.26.0 (2024-12-05)

New features

  • Added support for property version and class method get_active_session for Session class.

  • Added new methods and variables to enhance data type handling and JSON serialization/deserialization:

    • To DataType, its derived classes, and StructField:

      • type_name: Returns the type name of the data.

      • simple_string: Provides a simple string representation of the data.

      • json_value: Returns the data as a JSON-compatible value.

      • json: Converts the data to a JSON string.

    • To ArrayType, MapType, StructField, PandasSeriesType, PandasDataFrameType and StructType:

      • from_json: Enables these types to be created from JSON data.

    • To MapType:

      • keyType: keys of the map

      • valueType: values of the map

  • Added support for method appName in SessionBuilder.

  • Added support for include_nulls argument in DataFrame.unpivot.

  • Added support for following functions in functions.py:

    • size to get size of array, object, or map columns.

    • collect_list an alias of array_agg.

    • concat_ws_ignore_nulls to concatenate strings with a separator, ignoring null values.

    • substring makes len argument optional.

  • Added parameter ast_enabled to session for internal usage (default: False).

Improvements

  • Added support for specifying the following to DataFrame.create_or_replace_dynamic_table:

    • iceberg_config A dictionary that can hold the following iceberg configuration options:

      • external_volume

      • catalog

      • base_location

      • catalog_sync

      • storage_serialization_policy

  • Added support for nested data types to DataFrame.print_schema

  • Added support for level parameter to DataFrame.print_schema

  • Improved flexibility of DataFrameReader and DataFrameWriter API by adding support for the following:

    • Added format method to DataFrameReader and DataFrameWriter to specify file format when loading or unloading results.

    • Added load method to DataFrameReader to work in conjunction with format.

    • Added save method to DataFrameWriter to work in conjunction with format.

    • Added support to read keyword arguments to options method for DataFrameReader and DataFrameWriter.

  • Relaxed the cloudpickle dependency for Python 3.11 to simplify build requirements. However, for Python 3.11, cloudpickle==2.2.1 remains the only supported version.

Bug fixes

  • Removed warnings that dynamic pivot features were in private preview, because dynamic pivot is now generally available.

  • Fixed a bug in session.read.options where False Boolean values were incorrectly parsed as True in the generated file format.

Dependency updates

  • Added a runtime dependency on python-dateutil.

Snowpark pandas API updates

New features

  • Added partial support for Series.map when arg is a pandas Series or a collections.abc.Mapping. No support for instances of dict that implement __missing__ but are not instances of collections.defaultdict.

  • Added support for DataFrame.align and Series.align for axis=1 and axis=None.

  • Added support for pd.json_normalize.

  • Added support for GroupBy.pct_change with axis=0, freq=None, and limit=None.

  • Added support for DataFrameGroupBy.__iter__ and SeriesGroupBy.__iter__.

  • Added support for np.sqrt, np.trunc, np.floor, numpy trig functions, np.exp, np.abs, np.positive and np.negative.

  • Added partial support for the dataframe interchange protocol method DataFrame.__dataframe__().

Bug fixes

  • Fixed a bug in df.loc where setting a single column from a series results in unexpected None values.

Improvements

  • Use UNPIVOT INCLUDE NULLS for unpivot operations in pandas instead of sentinel values.

  • Improved documentation for pd.read_excel.

Version 1.25.0 (2024-11-13)

New features

  • Added the following new functions in snowflake.snowpark.dataframe:

    • map

Improvements

  • When target stage is not set in profiler, a default stage from Session.get_session_stage is used instead of raising SnowparkSQLException.

  • Allowed lower case or mixed case input when calling Session.stored_procedure_profiler.set_active_profiler.

  • Added distributed tracing using open telemetry APIs for action function in DataFrame:

    • cache_result

  • Removed opentelemetry warning from logging.

Bug fixes

  • Fixed the pre-action and post-action query propagation when In expressions were used in selects.

  • Fixed a bug that raised error AttributeError while calling Session.stored_procedure_profiler.get_output when Session.stored_procedure_profiler is disabled.

Dependency updates

  • Added a dependency on protobuf>=5.28 and tzlocal at runtime.

  • Added a dependency on protoc-wheel-0 for the development profile.

  • Require snowflake-connector-python>=3.12.0, <4.0.0 (was >=3.10.0).

Snowpark pandas API updates

New features

  • Added support for Index.to_numpy.

  • Added support for DataFrame.align and Series.align for axis=0.

  • Added support for snowflake.snowpark.functions.window

  • Added support for pd.read_pickle (Uses native pandas for processing).

  • Added support for pd.read_html (Uses native pandas for processing).

  • Added support for pd.read_xml (Uses native pandas for processing).

  • Added support for aggregation functions "size" and len in GroupBy.aggregate, DataFrame.aggregate, and Series.aggregate.

  • Added support for list values in Series.str.len.

Bug fixes

  • Fixed a bug where aggregating a single-column dataframe with a single callable function (e.g. pd.DataFrame([0]).agg(np.mean)) would fail to transpose the result.

  • Fixed bugs where DataFrame.dropna() would:

    • Treat an empty subset (e.g. []) as if it specified all columns instead of no columns.

    • Raise a TypeError for a scalar subset instead of filtering on just that column.

    • Raise a ValueError for a subset of type pandas.Index instead of filtering on the columns in the index.

  • Creation of scoped read-only tables to mitigate TableNotFoundError when using dynamic pivot in a notebook environment.

  • Fixed a bug when concat dataframe or series objects are coming from the same dataframe when axis = 1.

Improvements

  • Improve np.where with scalar x value by eliminating unnecessary join and temp table creation.

  • Improve get_dummies performance by flattening the pivot with join.

Snowpark local testing updates

New features

  • Added support for patching functions that are unavailable in the snowflake.snowpark.functions module.

  • Added support for snowflake.snowpark.functions.any_value

Bug fixes

  • Fixed a bug where Table.update could not handle VariantType, MapType, and ArrayType data types.

  • Fixed a bug where column aliases were incorrectly resolved in DataFrame.join, causing errors when selecting columns from a joined DataFrame.

  • Fixed a bug where Table.update and Table.merge could fail if the target table’s index was not the default RangeIndex.

Version 1.24.0 (2024-10-28)

New features

  • Updated Session class to be thread-safe. This allows concurrent DataFrame transformations, DataFrame actions, UDF and stored procedure registration, and concurrent file uploads when using the same Session object.

    • The feature is disabled by default and can be enabled by setting FEATURE_THREAD_SAFE_PYTHON_SESSION to True for account.

    • Updating session configurations, like changing database or schema, when multiple threads are using the session may lead to unexpected behavior.

    • When enabled, some internally created temporary table names returned from DataFrame.queries API are not deterministic, and may be different when DataFrame actions are executed. This does not affect explicit user-created temporary tables.

  • Added support for ‘Service’ domain to session.lineage.trace API.

  • Added support for the following methods in DataFrameWriter to support daisy-chaining:

    • option

    • options

    • partition_by

  • Added support for snowflake_cortex_summarize.

Improvements

  • Improved the following new capability for function snowflake.snowpark.functions.array_remove so that it is now possible to use in python.

  • Disables SQL simplification when sort is performed after limit.

    • Previously, df.sort().limit() and df.limit().sort() generated the same query with sort in front of limit. Now, df.limit().sort() generates a query that reads df.limit().sort().

    • Improve performance of generated queries for df.limit().sort() because limit stops table scanning as soon as the number of records is satisfied.

Bug fixes

  • Fixed a bug where the automatic cleanup of temporary tables could interfere with the results of async query execution.

  • Fixed a bug in DataFrame.analytics.time_series_agg function to handle multiple data points in same sliding interval.

  • Fixed a bug that created inconsistent casing in field names of structured objects in iceberg schemas.

Deprecations

Snowpark pandas API updates

New features

  • Added support for np.subtract, np.multiply, np.divide, and np.true_divide.

  • Added support for tracking usages of __array_ufunc__.

  • Added numpy compatibility support for np.float_power, np.mod, np.remainder, np.greater, np.greater_equal, np.less, np.less_equal, np.not_equal, and np.equal.

  • Added numpy compatibility support for np.log, np.log2, and np.log10

  • Added support for DataFrameGroupBy.bfill, SeriesGroupBy.bfill, DataFrameGroupBy.ffill, and SeriesGroupBy.ffill.

  • Added support for on parameter with Resampler.

  • Added support for timedelta inputs in value_counts().

  • Added support for applying Snowpark Python function snowflake_cortex_summarize.

  • Added support for DataFrame.attrs and Series.attrs.

  • Added support for DataFrame.style.

  • Added numpy compatibility support for np.full_like

Improvements

  • Improved generated SQL query for head and iloc when the row key is a slice.

  • Improved error message when passing an unknown timezone to tz_convert and tz_localize in Series, DataFrame, Series.dt, and DatetimeIndex.

  • Improved documentation for tz_convert and tz_localize in Series, DataFrame, Series.dt, and DatetimeIndex to specify the supported timezone formats.

  • Added additional kwargs support for df.apply and series.apply ( as well as map and applymap ) when using snowpark functions. This allows for some position independent compatibility between apply and functions where the first argument is not a pandas object.

  • Improved generated SQL query for iloc and iat when the row key is a scalar.

  • Removed all joins in iterrows.

  • Improved documentation for Series.map to reflect the unsupported features.

  • Added support for np.may_share_memory which is used internally by many scikit-learn functions. This method will always return false when called with a Snowpark pandas object.

Bug Fixes

  • Fixed a bug where DataFrame and Series pct_change() would raise TypeError when input contained timedelta columns.

  • Fixed a bug where replace() would sometimes propagate Timedelta types incorrectly through replace(). Instead raise NotImplementedError for replace() on Timedelta.

  • Fixed a bug where DataFrame and Series round() would raise AssertionError for Timedelta columns. Instead raise NotImplementedError for round() on Timedelta.

  • Fixed a bug where reindex fails when the new index is a Series with non-overlapping types from the original index.

  • Fixed a bug where calling __getitem__ on a DataFrameGroupBy object always returned a DataFrameGroupBy object if as_index=False.

  • Fixed a bug where inserting timedelta values into an existing column would silently convert the values to integers instead of raising NotImplementedError.

  • Fixed a bug where DataFrame.shift() on axis=0 and axis=1 would fail to propagate timedelta types.

  • DataFrame.abs(), DataFrame.__neg__(), DataFrame.stack(), and DataFrame.unstack() now raise NotImplementedError for timedelta inputs instead of failing to propagate timedelta types.

Snowpark local testing updates

Bug fixes

  • Fixed a bug where DataFrame.alias raises KeyError for input column name.

  • Fixed a bug where to_csv on Snowflake stage fails when data contains empty strings.

Version 1.23.0 (2024-10-09)

New features

  • Added the following new functions in snowflake.snowpark.functions:

    • make_interval

  • Added support for using Snowflake Interval constants with Window.range_between() when the order by column is TIMESTAMP or DATE type.

  • Added support for file writes. This feature is currently in private preview.

  • Added thread_id to QueryRecord to track the thread id submitting the query history.

  • Added support for Session.stored_procedure_profiler.

Bug fixes

  • Fixed a bug where registering a stored procedure or UDxF with type hints would give a warning NoneType has no len() when trying to read default values from function.

Snowpark pandas API updates

New features

  • Added support for TimedeltaIndex.mean method.

  • Added support for some cases of aggregating Timedelta columns on axis=0 with agg or aggregate.

  • Added support for by, left_by, right_by, left_index, and right_index for pd.merge_asof.

  • Added support for passing parameter include_describe to Session.query_history.

  • Added support for DatetimeIndex.mean and DatetimeIndex.std methods.

  • Added support for Resampler.asfreq, Resampler.indices, Resampler.nunique, and Resampler.quantile.

  • Added support for resample frequency W, ME, YE with closed = "left".

  • Added support for DataFrame.rolling.corr and Series.rolling.corr for pairwise = False and int window.

  • Added support for string time-based window and min_periods = None for Rolling.

  • Added support for DataFrameGroupBy.fillna and SeriesGroupBy.fillna.

  • Added support for constructing Series and DataFrame objects with the lazy Index object as data, index, and columns arguments.

  • Added support for constructing Series and DataFrame objects with index and column values not present in DataFrame/Series data.

  • Added support for pd.read_sas (Uses native pandas for processing).

  • Added support for applying rolling().count() and expanding().count() to Timedelta series and columns.

  • Added support for tz in both pd.date_range and pd.bdate_range.

  • Added support for Series.items.

  • Added support for errors="ignore" in pd.to_datetime.

  • Added support for DataFrame.tz_localize and Series.tz_localize.

  • Added support for DataFrame.tz_convert and Series.tz_convert.

  • Added support for applying Snowpark Python functions (e.g., sin) in Series.map, Series.apply, DataFrame.apply and DataFrame.applymap.

Improvements

  • Improved to_pandas to persist the original timezone offset for TIMESTAMP_TZ type.

  • Improved dtype results for TIMESTAMP_TZ type to show correct timezone offset.

  • Improved dtype results for TIMESTAMP_LTZ type to show correct timezone.

  • Improved error message when passing non-bool value to numeric_only for groupby aggregations.

  • Removed unnecessary warning about sort algorithm in sort_values.

  • Use SCOPED object for internal create temp tables. The SCOPED objects will be stored sproc scoped if created within stored sproc, otherwise will be session scoped, and the object will be automatically cleaned at the end of the scope.

  • Improved warning messages for operations that lead to materialization with inadvertent slowness.

  • Removed unnecessary warning message about convert_dtype in Series.apply.

Bug fixes

  • Fixed a bug where an Index object created from a Series/DataFrame incorrectly updates the Series/DataFrame’s index name after an inplace update has been applied to the original Series/DataFrame.

  • Suppressed an unhelpful SettingWithCopyWarning that sometimes appeared when printing Timedelta columns.

  • Fixed inplace argument for Series objects derived from other Series objects.

  • Fixed a bug where Series.sort_values failed if series name overlapped with index column name.

  • Fixed a bug where transposing a dataframe would map Timedelta index levels to integer column levels.

  • Fixed a bug where Resampler methods on timedelta columns would produce integer results.

  • Fixed a bug where pd.to_numeric() would leave Timedelta inputs as Timedelta instead of converting them to integers.

  • Fixed loc set when setting a single row, or multiple rows, of a DataFrame with a Series value.

Version 1.22.1 (2024-09-11)

  • This is a re-release of 1.22.0. Please refer to the 1.22.0 release notes for detailed release content.

Version 1.22.0 (2024-09-10)

New features

  • Added the following new functions in snowflake.snowpark.functions:

    • array_remove

    • ln

Improvements

  • Improved documentation for Session.write_pandas by making the use_logical_type option more explicit.

  • Added support for specifying the following to DataFrameWriter.save_as_table:

    • enable_schema_evolution

    • data_retention_time

    • max_data_extension_time

    • change_tracking

    • copy_grants

    • iceberg_config - A dictionary that can hold the following iceberg configuration options:

      • external_volume

      • catalog

      • base_location

      • catalog_sync

      • storage_serialization_policy

  • Added support for specifying the following to DataFrameWriter.copy_into_table:

    • iceberg_config - A dictionary that can hold the following iceberg configuration options:

      • external_volume

      • catalog

      • base_location

      • catalog_sync

      • storage_serialization_policy

  • Added support for specifying the following parameters to DataFrame.create_or_replace_dynamic_table:

    • mode

    • refresh_mode

    • initialize

    • clustering_keys

    • is_transient

    • data_retention_time

    • max_data_extension_time

Bug fixes

  • Fixed a bug in session.read.csv that caused an error when setting PARSE_HEADER = True in an externally defined file format.

  • Fixed a bug in query generation from set operations that allowed generation of duplicate queries when children have common subqueries.

  • Fixed a bug in session.get_session_stage that referenced a non-existing stage after switching database or schema.

  • Fixed a bug where calling DataFrame.to_snowpark_pandas without explicitly initializing the Snowpark pandas plugin caused an error.

  • Fixed a bug where using the explode function in dynamic table creation caused a SQL compilation error due to improper boolean type casting on the outer parameter.

Snowpark local testing updates

New features

  • Added support for type coercion when passing columns as input to UDF calls.

  • Added support for Index.identical.

Bug fixes

  • Fixed a bug where the truncate mode in DataFrameWriter.save_as_table incorrectly handled DataFrames containing only a subset of columns from the existing table.

  • Fixed a bug where function to_timestamp does not set the default timezone of the column datatype.

Snowpark pandas API updates

New features

  • Added limited support for the Timedelta type, including the following features. Snowpark pandas will raise NotImplementedError for unsupported Timedelta use cases.

    • support for tracking the Timedelta type through copy, cache_result, shift, sort_index, assign, bfill, ffill, fillna, compare, diff, drop, dropna, duplicated, empty, equals, insert, isin, isna, items, iterrows, join, len, mask, melt, merge, nlargest, nsmallest, to_pandas.

    • support for converting non-timedelta to timedelta via astype.

    • NotImplementedError will be raised for the rest of methods that do not support Timedelta.

    • support for subtracting two timestamps to get a Timedelta.

    • support indexing with Timedelta data columns.

    • support for adding or subtracting timestamps and Timedelta.

    • support for binary arithmetic between two Timedelta values.

    • support for binary arithmetic and comparisons between Timedelta values and numeric values.

    • support for lazy TimedeltaIndex.

    • support for pd.to_timedelta.

    • support for GroupBy aggregations min, max, mean, idxmax, idxmin, std, sum, median, count, any, all, size, nunique, head, tail, aggregate.

    • support for GroupBy filtrations first and last.

    • support for TimedeltaIndex attributes: days, seconds, microseconds and nanoseconds.

    • support for diff with timestamp columns on axis=0 and axis=1.

    • support for TimedeltaIndex methods: ceil, floor and round.

    • support for TimedeltaIndex.total_seconds method.

  • Added support for index’s arithmetic and comparison operators.

  • Added support for Series.dt.round.

  • Added documentation pages for DatetimeIndex.

  • Added support for Index.name, Index.names, Index.rename, and Index.set_names.

  • Added support for Index.__repr__.

  • Added support for DatetimeIndex.month_name and DatetimeIndex.day_name.

  • Added support for Series.dt.weekday, Series.dt.time, and DatetimeIndex.time.

  • Added support for Index.min and Index.max.

  • Added support for pd.merge_asof.

  • Added support for Series.dt.normalize and DatetimeIndex.normalize.

  • Added support for Index.is_boolean, Index.is_integer, Index.is_floating, Index.is_numeric, and Index.is_object.

  • Added support for DatetimeIndex.round, DatetimeIndex.floor and DatetimeIndex.ceil.

  • Added support for Series.dt.days_in_month and Series.dt.daysinmonth.

  • Added support for DataFrameGroupBy.value_counts and SeriesGroupBy.value_counts.

  • Added support for Series.is_monotonic_increasing and Series.is_monotonic_decreasing.

  • Added support for Index.is_monotonic_increasing and Index.is_monotonic_decreasing.

  • Added support for pd.crosstab.

  • Added support for pd.bdate_range and included business frequency support (B, BME, BMS, BQE, BQS, BYE, BYS) for both pd.date_range and pd.bdate_range.

  • Added support for lazy Index objects as labels in DataFrame.reindex and Series.reindex.

  • Added support for Series.dt.days, Series.dt.seconds, Series.dt.microseconds, and Series.dt.nanoseconds.

  • Added support for creating a DatetimeIndex from an Index of numeric or string type.

  • Added support for string indexing with Timedelta objects.

  • Added support for Series.dt.total_seconds method.

Improvements

  • Improved concat and join performance when operations are performed on a series coming from the same DataFrame by avoiding unnecessary joins.

  • Refactored quoted_identifier_to_snowflake_type to avoid making metadata queries if the types have been cached locally.

  • Improved pd.to_datetime to handle all local input cases.

  • Create a lazy index from another lazy index without pulling data to client.

  • Raised NotImplementedError for Index bitwise operators.

  • Display a more clear error message when Index.names is set to a non-list-like object.

  • Raise a warning whenever MultiIndex values are pulled in locally.

  • Improved warning message for pd.read_snowflake to include the creation reason when temp table creation is triggered.

  • Improved performance for DataFrame.set_index, or setting DataFrame.index or Series.index by avoiding checks that require eager evaluation. As a consequence, when the new index that does not match the current Series or DataFrame object length, a ValueError is no longer raised. Instead, when the Series or DataFrame object is longer than the provided index, the new index of the Series or DataFrame is filled with NaN values for the “extra” elements. Otherwise, the extra values in the provided index are ignored.

Bug fixes

  • Stopped ignoring nanoseconds in pd.Timedelta scalars.

  • Fixed AssertionError in tree of binary operations.

  • Fixed bug in Series.dt.isocalendar using a named Series

  • Fixed inplace argument for Series objects derived from DataFrame columns.

  • Fixed a bug where Series.reindex and DataFrame.reindex did not update the result index’s name correctly.

  • Fixed a bug where Series.take did not give an error when axis=1 was specified.

Version 1.21.1 (2024-09-05)

Bug fixes

  • Fixed a bug where using to_pandas_batches with async jobs caused an error due to improper handling of waiting for asynchronous query completion.

Version 1.21.0 (2024-08-19)

New features

  • Added support for snowflake.snowpark.testing.assert_dataframe_equal, which is a utility function to check the equality of two Snowpark DataFrames.

Improvements

  • Added support for server-side string size limitations.

  • Added support for creating and invoking stored procedures, UDFs and UDTFs with optional arguments.

  • Added support for column lineage in the DataFrame.lineage.trace API.

  • Added support for passing INFER_SCHEMA options to DataFrameReader via INFER_SCHEMA_OPTIONS.

  • Added support for passing parameters parameter to Column.rlike and Column.regexp.

  • Added support for automatically cleaning up temporary tables created by df.cache_result() in the current session when the DataFrame is no longer referenced (i.e., gets garbage collected). It is still an experimental feature and not enabled by default. It can be enabled by setting session.auto_clean_up_temp_table_enabled to True.

  • Added support for string literals to the fmt parameter of snowflake.snowpark.functions.to_date.

Bug fixes

  • Fixed a bug where the SQL generated for selecting * column has an incorrect subquery.

  • Fixed a bug in DataFrame.to_pandas_batches where the iterator could throw an error if a certain transformation is made to the pandas DataFrame due to the wrong isolation level.

  • Fixed a bug in DataFrame.lineage.trace to split the quoted feature view’s name and version correctly.

  • Fixed a bug in Column.isin that caused invalid SQL generation when passed an empty list.

  • Fixed a bug that fails to raise NotImplementedError while setting a cell with a list-like item.

Snowpark local testing updates

New features

  • Added support for the following APIs:

    • snowflake.snowpark.functions

      • rank

      • dense_rank

      • percent_rank

      • cume_dist

      • ntile

      • datediff

      • array_agg

    • snowflake.snowpark.column.Column.within_group

  • Added support for parsing flags in Regex statements for mocked plans. This maintains parity with the rlike and regexp changes above.

Bug fixes

  • Fixed a bug where the window functions LEAD and LAG do not handle the option ignore_nulls properly.

  • Fixed a bug where values were not populated into the result DataFrame during the insertion of a table merge operation.

Improvements

  • Fix pandas FutureWarning about integer indexing.

Snowpark pandas API updates

New features

  • Added support for DataFrame.backfill, DataFrame.bfill, Series.backfill, and Series.bfill.

  • Added support for DataFrame.compare and Series.compare with default parameters.

  • Added support for Series.dt.microsecond and Series.dt.nanosecond.

  • Added support for Index.is_unique and Index.has_duplicates.

  • Added support for Index.equals.

  • Added support for Index.value_counts.

  • Added support for Series.dt.day_name and Series.dt.month_name.

  • Added support for indexing on Index, e.g., df.index[:10].

  • Added support for DataFrame.unstack and Series.unstack.

  • Added support for DataFrame.asfreq and Series.asfreq.

  • Added support for Series.dt.is_month_start and Series.dt.is_month_end.

  • Added support for Index.all and Index.any.

  • Added support for Series.dt.is_year_start and Series.dt.is_year_end.

  • Added support for Series.dt.is_quarter_start and Series.dt.is_quarter_end.

  • Added support for lazy DatetimeIndex.

  • Added support for Series.argmax and Series.argmin.

  • Added support for Series.dt.is_leap_year.

  • Added support for DataFrame.items.

  • Added support for Series.dt.floor and Series.dt.ceil.

  • Added support for Index.reindex.

  • Added support for DatetimeIndex properties: year, month, day, hour, minute, second, microsecond, nanosecond, date, dayofyear, day_of_year, dayofweek, day_of_week, weekday, quarter, is_month_start, is_month_end, is_quarter_start, is_quarter_end, is_year_start, is_year_end and is_leap_year.

  • Added support for Resampler.fillna and Resampler.bfill.

  • Added limited support for the Timedelta type, including creating Timedelta columns and to_pandas.

  • Added support for Index.argmax and Index.argmin.

Improvements

  • Removed the public preview warning message when importing Snowpark pandas.

  • Removed unnecessary count query from the SnowflakeQueryCompiler.is_series_like method.

  • Dataframe.columns now returns a native pandas Index object instead of a Snowpark Index object.

  • Refactored and introduced query_compiler argument in Index constructor to create Index from query compiler.

  • pd.to_datetime now returns a DatetimeIndex object instead of a Series object.

  • pd.date_range now returns a DatetimeIndex object instead of a Series object.

Bug fixes

  • Made passing an unsupported aggregation function to pivot_table raise NotImplementedError instead of KeyError.

  • Removed axis labels and callable names from error messages and telemetry about unsupported aggregations.

  • Fixed AssertionError in Series.drop_duplicates and DataFrame.drop_duplicates when called after sort_values.

  • Fixed a bug in Index.to_frame where the result frame’s column name may be wrong where name is unspecified.

  • Fixed a bug where some Index docstrings are ignored.

  • Fixed a bug in Series.reset_index(drop=True) where the result name may be wrong.

  • Fixed a bug in Groupby.first/last ordering by the correct columns in the underlying window expression.

Version 1.20.0 (2024-07-17)

Version 1.20.0 of the Snowpark Library for Python introduces some new features.

New features

  • Added distributed tracing using open telemetry APIs for table stored procedure functions in DataFrame:

    • _execute_and_get_query_id

  • Added support for the arrays_zip function.

  • Improved performance for binary column expressions and df._in by avoiding unnecessary casts for numeric values. You can enable this optimization by setting session.eliminate_numeric_sql_value_cast_enabled = True.

  • Improved error messages for write_pandas when the target table does not exist and auto_create_table=False.

  • Added open telemetry tracing on UDxF functions in Snowpark.

  • Added open telemetry tracing on stored procedure registration in Snowpark.

  • Added a new optional parameter called format_json to the Session.SessionBuilder.app_name function that sets the app name in the Session.query_tag in JSON format. By default, this parameter is set to False.

Bug fixes

  • Fixed a bug where the SQL generated for lag(x, 0) was incorrect and failed with the error message argument 1 to function LAG needs to be constant, found 'SYSTEM$NULL_TO_FIXED(null)'.

Snowpark local testing updates

New features

  • Added support for the following APIs:

    • snowflake.snowpark.functions

      • random

  • Added new parameters to the patch function when registering a mocked function:

    • distinct allows an alternate function to be specified for when a SQL function should be distinct.

    • pass_column_index passes a named parameter, column_index, to the mocked function that contains the pandas.Index for the input data.

    • pass_row_index passes a named parameter, row_index, to the mocked function that is the 0-indexed row number on which the function is currently operating.

    • pass_input_data passes a named parameter, input_data, to the mocked function that contains the entire input dataframe for the current expression.

    • Added support for the column_order parameter in the DataFrameWriter.save_as_table method.

Bug fixes

  • Fixed a bug that caused DecimalType columns to be incorrectly truncated to integer precision when used in BinaryExpressions.

Snowpark pandas API Updates

New features

  • Added new API support for the following:

    • DataFrames

      • DataFrame.nlargest and DataFrame.nsmallest

      • DataFrame.assign

      • DataFrame.stack

      • DataFrame.pivot

      • DataFrame.to_csv

      • DataFrame.corr

      • DataFrame.corr

      • DataFrame.equals

      • DataFrame.reindex

      • DataFrame.at and DataFrame.iat

    • Series

      • Series.nlargest and Series.nsmallest

      • Series.at and Series.iat

      • Series.dt.isocalendar

      • Series.equals

      • Series.reindex

      • Series.to_csv

      • Series.case_when except when condition or replacement is callable

      • series.plot() with data materialized the data to the local client

    • GroupBy

      • DataFrameGroupBy.all and DataFrameGroupBy.any

      • DataFrameGroupBy and SeriesGroupBy aggregations first and last

      • DataFrameGroupBy.get_group

      • SeriesGroupBy.all and SeriesGroupBy.any

    • General

      • pd.pivot

      • read_excel (Uses local pandas for processing)

      • df.plot() with data materialized the data to the local client

  • Extended existing APIs as follows:

    • Added support for replace and frac > 1 in DataFrame.sample and Series.sample.

    • Added partial support for Series.str.translate where the values in the table are single-codepoint strings.

    • Added support for limit parameter when method parameter is used in fillna.

  • Added documentation pages for Index and its APIs.

Bug fixes

  • Fixed an issue when using np.where and df.where when the scalar other is the literal 0.

  • Fixed a bug regarding precision loss when converting to Snowpark pandas DataFrame or Series with dtype=np.uint64.

  • Fixed a bug where values is set to index when index and columns contain all columns in DataFrame during pivot_table.

Improvements

  • Added support for Index.copy().

  • Added support for Index APIs: dtype, values, item(), tolist(), to_series() and to_frame().

  • Expand support for DataFrames with no rows in pd.pivot_table and DataFrame.pivot_table.

  • Added support for inplace parameter in DataFrame.sort_index and Series.sort_index.

Version 1.19.0 (2024-06-25)

Version 1.19.0 of the Snowpark Library for Python introduces some new features.

New features

  • Added support for the to_boolean function.

  • Added documentation pages for Index and its APIs.

Bug fixes

  • Fixed a bug where Python stored procedures with tables return type fails when run in a task.

  • Fixed a bug where df.dropna fails due to RecursionError: maximum recursion depth exceeded when the DataFrame has more than 500 columns.

  • Fixed a bug where AsyncJob.result("no_result") doesn’t wait for the query to finish execution.

Local testing updates

New features

  • Added support for the strict parameter when registering UDFs and Stored Procedures.

Bug fixes

  • Fixed a bug in convert_timezone that made setting the source_timezone parameter return an error.

  • Fixed a bug where creating a DataFrame with empty data of type DateType raises AttributeError.

  • Fixed a bug where table merge fails when an update clause exists but no update takes place.

  • Fixed a bug in the mock implementation of to_char that raises IndexError when an incoming column has a nonconsecutive row index.

  • Fixed a bug in handling CaseExpr expressions that raises IndexError when an incoming column has a nonconsecutive row index.

  • Fixed a bug in the implementation of Column.like that raises IndexError when an incoming column has a nonconsecutive row index.

Improvements

  • Added support for type coercion in the implementation of DataFrame.replace, DataFrame.dropna, and the mock function iff.

Snowpark pandas API updates

New features

  • Added partial support for DataFrame.pct_change and Series.pct_change without the freq and limit parameters.

  • Added support for Series.str.get.

  • Added support for Series.dt.dayofweek, Series.dt.day_of_week, Series.dt.dayofyear, and Series.dt.day_of_year.

  • Added support for Series.str.__getitem__ (Series.str[...]).

  • Added support for Series.str.lstrip and Series.str.rstrip.

  • Added support for DataFrameGroupby.size and SeriesGroupby.size.

  • Added support for DataFrame.expanding and Series.expanding for aggregations count, sum, min, max, mean, std, and var with axis=0.

  • Added support for DataFrame.rolling and Series.rolling for aggregation count with axis=0.

  • Added support for Series.str.match.

  • Added support for DataFrame.resample and Series.resample for aggregation size.

Bug fixes

  • Fixed a bug that causes output of GroupBy.aggregate columns to be ordered incorrectly.

  • Fixed a bug where calling DataFrame.describe on a frame with duplicate columns of differing dtypes could cause an error or incorrect results.

  • Fixed a bug in DataFrame.rolling and Series.rolling so window=0 now throws NotImplementedError instead of ValueError

Improvements

  • Added support for named aggregations in DataFrame.aggregate and Series.aggregate with axis=0.

  • pd.read_csv reads using the native pandas CSV parser, then uploads data to Snowflake using parquet. This enables most of the parameters supported by read_csv, including date parsing and numeric conversions. Uploading via parquet is roughly twice as fast as uploading via CSV.

  • Initial work to support a pd.Index directly in Snowpark pandas. Support for pd.Index as a first-class component of Snowpark pandas is under active development.

  • Added a lazy index constructor and support for len, shape, size, empty, to_pandas(), and names. For df.index, Snowpark pandas creates a lazy index object.

  • For df.columns, Snowpark pandas supports a non-lazy version of an Index as the data is already stored locally.

Version 1.18.0 (2024-05-28)

Version 1.18.0 of the Snowpark library introduces some new features.

New features

  • Added the DataFrame.cache_result and Series.cache_result methods for users to persist DataFrame and Series objects to a temporary table for the duration of a session to improve latency of subsequent operations.

Improvements

  • Added support for DataFrame.pivot_table with no index parameter and with the margins parameter.

  • Updated the signature of DataFrame.shift, Series.shift, DataFrameGroupBy.shift, and SeriesGroupBy.shift to match pandas 2.2.1. Snowpark pandas does not yet support the newly-added suffix argument or sequence values of periods.

  • Re-added support for Series.str.split.

Bug fixes

  • Fixed an issue with mixed columns for string methods (Series.str.*).

Local testing updates

New features

  • Added support for the following DataFrameReader read options to file formats CSV and JSON:

    • PURGE

    • PATTERN

    • INFER_SCHEMA with value False

    • ENCODING with value UTF8

  • Added support for DataFrame.analytics.moving_agg and DataFrame.analytics.cumulative_agg_agg.

  • Added support for the if_not_exists parameter during UDF and stored procedure registration.

Bug fixes

  • Fixed a bug with processing time formats where the fractional second part was not handled properly.

  • Fixed a bug that caused function calls on * to fail.

  • Fixed a bug that prevented the creation of map and struct type objects.

  • Fixed a bug where the function date_add was unable to handle some numeric types.

  • Fixed a bug where TimestampType casting resulted in incorrect data.

  • Fixed a bug that caused DecimalType data to have incorrect precision in some cases.

  • Fixed a bug where referencing a missing table or view raised an IndexError.

  • Fixed a bug where the mocked function to_timestamp_ntz could not handle None data.

  • Fixed a bug where mocked UDFs handled output data of None improperly.

  • Fixed a bug where DataFrame.with_column_renamed ignored attributes from parent DataFrames after join operations.

  • Fixed a bug where the integer precision of large values was lost when converted to a pandas DataFrame.

  • Fixed a bug where the schema of a datetime object was wrong when creating a DataFrame from a pandas DataFrame.

  • Fixed a bug in the implementation of Column.equal_nan where null data was handled incorrectly.

  • Fixed a bug where DataFrame.drop ignored attributes from parent DataFrames after join operations.

  • Fixed a bug in mocked function date_part where column type was set incorrectly.

  • Fixed a bug where DataFrameWriter.save_as_table did not raise exceptions when inserting null data into non-nullable columns.

  • Fixed a bug in the implementation of DataFrameWriter.save_as_table where:

    • Append or truncate failed when incoming data had a different schema than the existing table.

    • Truncate failed when incoming data did not specify columns that are nullable.

Improvements

  • Removed the dependency check for pyarrow because it is not used.

  • Improved the target type coverage of Column.cast, adding support for casting to boolean and all integral types.

  • Aligned the error experience when calling UDFs and stored procedures.

  • Added appropriate error messages for the is_permanent and anonymous options in UDFs and stored procedures registration to make it clearer that those features are not yet supported.

  • File read operations with unsupported options and values now raise NotImplementedError instead of warnings and unclear error information.

Version 1.17.0 (2024-05-21)

Version 1.17.0 of the Snowpark library introduces some new features.

New features

  • Added support to add a comment on tables and views using the functions listed below:

    • DataFrameWriter.save_as_table

    • DataFrame.create_or_replace_view

    • DataFrame.create_or_replace_temp_view

    • DataFrame.create_or_replace_dynamic_table

Improvements

  • Improved error message to remind users to set {"infer_schema": True} when reading CSV file without specifying its schema.

Local testing updates

New features

  • Added support for NumericType and VariantType data conversion in the mocked function to_timestamp_ltz, to_timestamp_ntz, to_timestamp_tz and to_timestamp.

  • Added support for DecimalType, BinaryType, ArrayType, MapType, TimestampType, DateType and TimeType data conversion in the mocked function to_char.

  • Added support for the following APIs:

    • snowflake.snowpark.functions.to_varchar

    • snowflake.snowpark.DataFrame.pivot

    • snowflake.snowpark.Session.cancel_all

  • Introduced a new exception class snowflake.snowpark.mock.exceptions.SnowparkLocalTestingException.

  • Added support for casting to FloatType.

Bug fixes

  • Fixed a bug that stored procedures and UDFs should not remove imports already in the sys.path during the clean-up step.

  • Fixed a bug that when processing datetime format, the fractional second part is not handled properly.

  • Fixed a bug where file operations on the Windows platform were unable to properly handle file separators in directory names.

  • Fixed a bug that on the Windows platform that, when reading a pandas dataframe, an IntervalType column with integer data can not be processed.

  • Fixed a bug that prevented users from being able to select multiple columns with the same alias.

  • Fixed a bug where Session.get_current_[schema|database|role|user|account|warehouse] returns uppercased identifiers when identifiers are quoted.

  • Fixed a bug that function substr and substring can not handle a zero-based start_expr.

Improvements

  • Standardized the error experience by raising SnowparkLocalTestingException in error cases, which is on par with the SnowparkSQLException raised in non-local execution.

  • Improved the error experience of the Session.write_pandas method so that NotImplementError will be raised when called.

  • Aligned the error experience with reusing a closed session in non-local execution.

Version 1.16.0 (2024-05-08)

Version 1.16.0 of the Snowpark library introduces some new features.

New features

  • Added snowflake.snowpark.Session.lineage.trace to explore data lineage of Snowflake objects.

  • Added support for registering stored procedures with packages given as Python modules.

  • Added support for structured type schema parsing.

Bug fixes

  • Fixed a bug where, when inferring a schema, single quotes were added to stage files that already had single quotes.

Local testing updates

New features

  • Added support for StringType, TimestampType and VariantType data conversion in the mocked function to_date.

  • Added support for the following APIs:

    • snowflake.snowpark.functions:

      • get

      • concat

      • concat_ws

Bug fixes

  • Fixed a bug that caused NaT and NaN values to not be recognized.

  • Fixed a bug where, when inferring a schema, single quotes were added to stage files that already had single quotes.

  • Fixed a bug where DataFrameReader.csv was unable to handle quoted values containing a delimiter.

  • Fixed a bug that when there is a None value in an arithmetic calculation, the output should remain None instead of math.nan.

  • Fixed a bug in function sum and covar_pop that when there is a math.nan value in the data, the output should also be math.nan.

  • Fixed a bug where stage operations can not handle directories.

  • Fixed a bug that DataFrame.to_pandas should take Snowflake numeric types with precision 38 as int64.

Version 1.15.0 (2024-04-24)

Version 1.15.0 of the Snowpark library introduces some new features.

New features

  • Added truncate save mode in DataFrameWrite to overwrite existing tables by truncating the underlying table instead of dropping it.

  • Added telemetry to calculate query plan height and number of duplicate nodes during collect operations.

  • Added the functions below to unload data from a DataFrame into one or more files in a stage:

    • DataFrame.write.json

    • DataFrame.write.csv

    • DataFrame.write.parquet

  • Added distributed tracing using open telemetry APIs for action functions in DataFrame and DataFrameWriter:

    • snowflake.snowpark.DataFrame:

      • collect

      • collect_nowait

      • to_pandas

      • count

      • show

    • snowflake.snowpark.DataFrameWriter:

      • save_as_table

  • Added support for snow:// URLs to snowflake.snowpark.Session.file.get and snowflake.snowpark.Session.file.get_stream

  • Added support to register stored procedures and UDFs with a comment.

  • UDAF client support is ready for public preview. Please stay tuned for the Snowflake announcement of UDAF public preview.

  • Added support for dynamic pivot. This feature is currently in private preview.

Improvements

  • Improved the generated query performance for both compilation and execution by converting duplicate subqueries to Common Table Expressions (CTEs). It is still an experimental feature and it is not enabled by default. You can enable it by setting session.cte_optimization_enabled to True.

Bug fixes

  • Fixed a bug where statement_params is not passed to query executions that register stored procedures and user defined functions.

  • Fixed a bug causing snowflake.snowpark.Session.file.get_stream to fail for quoted stage locations.

  • Fixed a bug that an internal type hint in utils.py might raise AttributeError when the underlying module can not be found.

Local testing updates

New features

  • Added support for registering UDFs and stored procedures.

  • Added support for the following APIs:

    • snowflake.snowpark.Session:

      • file.put

      • file.put_stream

      • file.get

      • file.get_stream

      • read.json

      • add_import

      • remove_import

      • get_imports

      • clear_imports

      • add_packages

      • add_requirements

      • clear_packages

      • remove_package

      • udf.register

      • udf.register_from_file

      • sproc.register

      • sproc.register_from_file

    • snowflake.snowpark.functions

      • current_database

      • current_session

      • date_trunc

      • object_construct

      • object_construct_keep_null

      • pow

      • sqrt

      • udf

      • sproc

  • Added support for StringType, TimestampType and VariantType data conversion in the mocked function to_time.

Bug fixes

  • Fixed a bug that null filled columns for constant functions.

  • Fixed to_object, to_array and to_binary to better handle null inputs.

  • Fixed a bug that timestamp data comparison can not handle years beyond 2262.

  • Fixed a bug that Session.builder.getOrCreate should return the created mock session.

Version 1.14.0 (2024-03-20)

Version 1.14.0 of the Snowpark library introduces some new features.

New features

  • Added support for creating vectorized UDTFs with the process method.

  • Added support for dataframe functions:

    • to_timestamp_ltz

    • to_timestamp_ntz

    • to_timestamp_tz

    • locate

  • Added support for ASOF JOIN type.

  • Added support for the following local testing APIs:

    • snowflake.snowpark.functions:

      • to_double

      • to_timestamp

      • to_timestamp_ltz

      • to_timestamp_ntz

      • to_timestamp_tz

      • greatest

      • least

      • convert_timezone

      • dateadd

      • date_part

    • snowflake.snowpark.Session:

      • get_current_account

      • get_current_warehouse

      • get_current_role

      • use_schema

      • use_warehouse

      • use_database

      • use_role

Improvements

  • Added telemetry to local testing.

  • Improved the error message of DataFrameReader to raise FileNotFound error when reading a path that does not exist or when there are no files under the path.

Bug fixes

  • Fixed a bug in SnowflakePlanBuilder where save_as_table does not correctly filter columns whose names start with $ and is followed by a number.

  • Fixed a bug where statement parameters might have no effect when resolving imports and packages.

  • Fixed bugs in local testing:

    • LEFT ANTI and LEFT SEMI joins drop rows with null values.

    • DataFrameReader.csv incorrectly parses data when the optional parameter field_optionally_enclosed_by is specified.

    • Column.regexp only considers the first entry when pattern is a Column.

    • Table.update raises KeyError when updating null values in the rows.

    • VARIANT columns raise errors at DataFrame.collect.

    • count_distinct does not work correctly when counting.

    • Null values in integer columns raise TypeError.

Version 1.13.0 (2024-02-26)

Version 1.13.0 of the Snowpark library introduces some new features.

New Features

  • Added support for an optional date_part argument in function last_day.

  • SessionBuilder.app_name will set the query_tag after the session is created.

  • Added support for the following local testing functions:

    • current_timestamp

    • current_date

    • current_time

    • strip_null_value

    • upper

    • lower

    • length

    • initcap

Improvements

  • Added cleanup logic at interpreter shutdown to close all active sessions.

Bug fixes

  • Fixed a bug in DataFrame.to_local_iterator where the iterator could yield wrong results if another query is executed before the iterator finishes due to wrong isolation level.

  • Fixed a bug that truncated table names in error messages while running a plan with local testing enabled.

  • Fixed a bug that Session.range returns empty result when the range is large.

Version 1.12.1 (2024-02-08)

Version 1.12.1 of the Snowpark library introduces some new features.

Improvements

  • Use split_blocks=True by default, during to_pandas conversion, for optimal memory allocation. This parameter is passed to pyarrow.Table.to_pandas, which enables PyArrow to split the memory allocation into smaller, more manageable blocks instead of allocating a single contiguous block. This results in better memory management when dealing with larger datasets.

Bug fixes

  • Fixed a bug in DataFrame.to_pandas that caused an error when evaluating on a Dataframe with an IntergerType column with null values.

Version 1.12.0 (2024-01-29)

Version 1.12.0 of the Snowpark library introduces some new features.

Behavior Changes (API Compatible)

  • When parsing data types during a to_pandas operation, we rely on GS precision value to fix precision issues for large integer values. This may affect users where a column that was earlier returned as int8 gets returned as int64. Users can fix this by explicitly specifying precision values for their return column.

  • Aligned behavior for Session.call in case of table stored procedures where running Session.call would not trigger a stored procedure unless a collect() operation was performed.

  • StoredProcedureRegistration now automatically adds snowflake-snowpark-python as a package dependency on the client’s local version of the library. An error is thrown if the server cannot support that version.

New features

  • Exposed statement_params in StoredProcedure.__call__.

  • Added two optional arguments to Session.add_import:

    • chunk_size: The number of bytes to hash per chunk of the uploaded files.

    • whole_file_hash: By default only the first chunk of the uploaded import is hashed to save time. When this is set to True each uploaded file is fully hashed instead.

  • Added parameters external_access_integrations and secrets when creating a UDAF from Snowpark Python to allow integration with external access.

  • Added a new method Session.append_query_tag, which allows an additional tag to be added to the current query tag by appending it as a comma separated value.

  • Added a new method Session.update_query_tag, which allows updates to a JSON encoded dictionary query tag.

  • SessionBuilder.getOrCreate will now attempt to replace the singleton it returns when token expiration has been detected.

  • Added the following functions in snowflake.snowpark.functions:

    • array_except

    • create_map

    • sign / signum

  • Added the following functions to DataFrame.analytics:

    • Added the moving_agg function in DataFrame.analytics to enable moving aggregations like sums and averages with multiple window sizes.

    • Added the cummulative_agg function in DataFrame.analytics to enable moving aggregations like sums and averages with multiple window sizes.

Bug fixes

  • Fixed a bug in DataFrame.na.fill that caused Boolean values to erroneously override integer values.

  • Fixed a bug in Session.create_dataframe where the Snowpark DataFrames created using pandas DataFrames were not inferring the type for timestamp columns correctly. The behavior is as follows:

    • Earlier timestamp columns without a timezone would be converted to nanosecond epochs and inferred as LongType(), but will now be correctly maintained as timestamp values and be inferred as TimestampType(TimestampTimeZone.NTZ).

    • Earlier timestamp columns with a timezone would be inferred as TimestampType(TimestampTimeZone.NTZ) and loose timezone information but will now be correctly inferred as TimestampType(TimestampTimeZone.LTZ) and timezone information is retained correctly.

    • Set session parameter PYTHON_SNOWPARK_USE_LOGICAL_TYPE_FOR_CREATE_DATAFRAME to revert back to old behavior. Snowflake recommends that you update your code to align with correct behavior because the parameter will be removed in the future.

  • Fixed a bug that DataFrame.to_pandas gets decimal type when scale is not 0, and creates an object dtype in pandas. Instead, we cast the value to a float64 type.

  • Fixed bugs that wrongly flattened the generated SQL when one of the following happens:

    • DataFrame.filter() is called after DataFrame.sort().limit().

    • DataFrame.sort() or filter() is called on a DataFrame that already has a window function or sequence-dependent data generator column. For instance, df.select("a", seq1().alias("b")).select("a", "b").sort("a") won’t flatten the sort clause anymore.

    • A window or sequence-dependent data generator column is used after DataFrame.limit(). For instance, df.limit(10).select(row_number().over()) won’t flatten the limit and select in the generated SQL.

  • Fixed a bug where aliasing a DataFrame column raised an error when the DataFame was copied from another DataFrame with an aliased column. For instance,

    df = df.select(col("a").alias("b"))
    df = copy(df)
    df.select(col("b").alias("c"))  # Threw an error. Now it's fixed.
    
    Copy
  • Fixed a bug in Session.create_dataframe that the non-nullable field in a schema is not respected for Boolean type. Note that this fix is only effective when the user has the privilege to create a temp table.

  • Fixed a bug in SQL simplifier where non-select statements in session.sql dropped a SQL query when used with limit().

  • Fixed a bug that raised an exception when session parameter ERROR_ON_NONDETERMINISTIC_UPDATE is true.