Snowflake ML release notes¶
This article contains the release notes for the Snowflake ML, including the following when applicable:
Behavior changes
New features
Customer-facing bug fixes
Note
These notes do not include changes in features that have not been publicly announced. Such features may appear in the Snowflake ML source code but not in the public documentation.
See Snowflake ML: End-to-End Machine Learning for documentation.
Verifying the snowflake-ml-python package¶
All Snowflake packages are signed, allowing you to verify their origin. To verify the snowflake.ml.python
package, follow the steps below:
Install
cosign
. This example uses the Go installation: Installing cosign with Go.Download the file from a repository such as PyPi.
Download a
.sig
file for that release from the GitHub releases page.Verify the signature using
cosign
. For example:
cosign verify-blob snowflake_ml_python-1.7.0.tar.gz --key snowflake-ml-python-1.7.0.pub --signature resources.linux.snowflake_ml_python-1.7.0.tar.gz.sig
cosign verify-blob snowflake_ml_python-1.7.0.tar.gz --key snowflake-ml-python-1.7.0.pub --signature resources.linux.snowflake_ml_python-1.7.0
Note
This example uses the library and signature for version 1.7.0 of the package. Use the filenames of the version you are verifying.
Version 1.7.0 (2024-10-22)¶
Behavior changes¶
General behavior changes:
Python 3.9 is now the minimum required version.
Data connector behavior changes:
to_torch_dataset
andto_torch_datapipe
now create a dimension of 1 for scalar data. This allows more seamless integration with the PyTorch DataLoader, which creates batches by stacking inputs. The following example illustrates the difference.ds = connector.to_torch_dataset(shuffle=False, batch_size=3)
Input data:
"col1": [10, 11, 12]
Previous result:
array([10., 11., 12.])
with shape(3,)
New result:
array([[10.], [11.], [12.]])
with shape(3, 1)
Input data:
[[0, 100], [1, 110], [2, 200]]
Previous result:
array([[ 0, 100], [ 1, 110], [ 2, 200]])
with shape(3,2)
New result: No change
You can now specify a batch size of
None
into_torch_dataset
to squeeze dimensions of 1 for better interoperability with the PyTorch DataLoader. ::code::None
is the new default batch size.
Model Development behavior changes:
The
eps
(epsilon) argument is no longer used with thelog_loss
metric. The argument is still accepted for backward compatibility, but its value is ignored, and the epsilon is now computed by the underlying scikit-lean implementation.
Model Registry behavior changes:
External access integrations are no longer required when creating an inference service in Snowflake 8.40 or later.
New featres¶
New Model Registry features:
You can now pass keyword arguments when instantiating
ModelContext
to provide a variable number of context values. For example:mc = custom_model.ModelContext( config = 'local_model_dir/config.json', m1 = model1 ) class ExamplePipelineModel(custom_model.CustomModel): def __init__(self, context: custom_model.ModelContext) -> None: super().__init__(context) v = open(self.context['config']).read() self.bias = json.loads(v)['bias'] @custom_model.inference_api def predict(self, input: pd.DataFrame) -> pd.DataFrame: model_output = self.context['m1'].predict(input) return pd.DataFrame({'output': model_output + self.bias})
Support for pandas’s
CategoricalDtype
for categorical columns.log_model
method now accepts bothsignature
andsample_input_data
parameters to capture background data from explainablity and data lineage.
Bug fixes¶
Data Connector bug fixes:
For multi-dimensional data,
to_torch_dataset
andto_torch_datapipe
now return a numpy array with an appropriate data type instead of a list.
Feature Store bug fixes:
Fixed an issue where
ExampleHelper
used an incomplete table name.Changed weather features aggregation time to one hour instead of one day.
Model Explainability bug fixes:
Fixed an issue with explainability for XGBoost models by using a new SHAP library version.
Version 1.6.4 (2024-10-17)¶
Bug fixes¶
Model Registry bug fixes:
Fix issue with using
ModelVersion.run
with Model Serving (inference on SPCS).
Version 1.6.3 (2024-10-07)¶
Behavior changes¶
Model Registry behavior changes:
This release no longer contains the preview Model Registry API. Use the public API in
snowflake.ml.model_registry
instead.
Bug fixes¶
Model Registry bug fixes:
Fix unexpected package name normaliations for packages that do not follow PEP-508 conventions when logging a model.
Fix “Not a valid remote URI” error when logging MLflow models.
Fix nested calls to
ModelVersion.run
.Fix
log_model
failure when a local package version number contains parts other than the base version.
New features¶
New Model Registry features:
You can now set a task type for the model in
log_model
via thetask
parameter.
New Feature Store features:
FeatureView
now supportsON_CREATE
andON_SCHEDULE
initializion modes.
Version 1.6.2 (2024-09-04)¶
Bug fixes¶
Fix a bug involving invalid names passed where fully-qualified names were required. These now correctly raise an exception.
Modeling bug fixes:
Correctly log models built using XGBoost version 2 and higher.
Model explainability bug fixes:
Workarounds and better error handling for XGBoost version 2.1.0 and higher.
Correctly handle multiclass XGBoost classification models
New features¶
New Feature Store features:
The
update_feature_view
method now accepts aFeatureView
object as an alternative to name and version.
Version 1.6.1 (2024-08-13)¶
Bug fixes¶
Feature Store bug fixes:
Metadata size is no longer limited when generating a dataset.
Model Registry bug fixes:
Fix an error message in the
run
method of model versions when a function name is not given and the model has multiple target methods.
New features¶
New Modeling features:
The
set_params
method is now available to set the parameters of the underlying scikit-learn estimator, if the Snowpark ML model has been fitted.
New Model Registry features:
Support for model explainability in XGBoost, LightGBM, CatBoost, and scikit-learn models supported by the
shap
ibrary.
Version 1.6.0 (2024-07-29)¶
Behavior changes¶
Feature Store behavior changes:
Many positional arguments are now keyword arguments. The following table lists the affected arguments for each method.
Method
Arguments
Entity
initializerdesc
FeatureView
initializertimestamp_col
,refresh_freq
,desc
FeatureStore
initializercreation_mode
FeatureStore.update_entity
desc
FeatureStore.register_feature_view
block
,overwrite
FeatureStore.list_feature_views
entity_name
,feature_view_name
FeatureStore.get_refresh_history
verbose
Feature:Store.retrieve_feature_values
spine_timestamp_col
,exclude_columns
,include_feature_view_timestamp_col
FeatureStore.generate_training_set
save_as
,spine_timestamp_col
,spine_label_cols
,exclude_columns
,include_feature_view_timestamp_col
FeatureStore.generate_dataset
version
,spine_timestamp_col
,spine_label_cols
,exclude_columns
,include_feature_view_timestamp_col
,desc
,output_type
Add new column
warehouse
to the output oflist_feature_views
.
Bug fixes¶
Modeling bug fixes:
Fixed an issue in which
SimpleImputer
could not impute integer columns with integer values.
Model Registry bug fixes:
Fixed an issue when providing a non-zero-index-based pandas Dataframe
ModelVersion.run
.
New features¶
New Feature Store features:
Added overloads to certain methods to accept both a
FeatureView
and name/version strings. Affected APIs includeread_feature_view
,refresh_feature_view
,get_refresh_history
,resume_feature_view
,suspend_feature_view
, anddelete_feature_view
.Added docstring inline examples for all public APIs.
Added
ExampleHelper
utility class to help with loading source data to simplify public notebooks.Added
update_entity
method.Added
warehouse
argument toFeatureView
constructor to override the default warehouse.
New Model Registry features:
Added option to enable explainability when registering XGBoost, LightGBM, and Catboost models.
Added support for logging a model from a
ModelVersion
object.
New modeling features:
You can disable the 10GB training data size limit in distributed hyperparameter optimization by executing:
from snowflake.ml.modeling._internal.snowpark_implementations import ( distributed_hpo_trainer, ) distributed_hpo_trainer.ENABLE_EFFICIENT_MEMORY_USAGE = False
Version 1.5.4 (2024-07-11)¶
Bug fixes¶
Model Registry bug fixes:
Fixed “401 Unauthorized” issue when deploying a model to Snowpark Container Services.
Feature Store bug fixes:
Some exceptions in property setters have been downgraded to warnings, allowing you to change
desc
,refresh_freq
, andwarehouse
in “draft” feature views.
Modeling bug fixes:
Fixed issues with calling
OneHotEncoder
andOrdinalEncoder
with a dictionary as thecategories
parameter and the data in a pandas DataFrame.
New features¶
New Model Registry features:
Allow overriding
device_map
anddevice
when loading Hugging Face pipeline models.Add
set_alias
andunset_alias
methods toModelVersion
instances to manage the model version’s aliases.Add
partitioned_inference_api
decorator to create partitioned inference methods in models.
New Feature Store features:
New
refresh_freq
,refresh_mode
, andscheduling_state
columns have been added to the output of thelist_feature_views
method.The
update_feature_view
method now supports updating a feature view’s description.New methods
refresh_feature_view
andget_refresh_history
manage updates of feature views.New method
generate_training_set
generates table-backed feature snapshots.generate_dataset(..., output_type="table")
has been deprecated and generates aDeprecationWarning
.
New Modeling features:
OneHotEncoder
andOrdinalEncoder
now accept a list of array-like values for thecategories
argument.
Version 1.5.3 (2024-06-17)¶
Bug fixes¶
Model Registry bug fixes:
Fix an issue causing incorrect results when using a pandas Dataframe with over 100,000 rows as the input of
ModelVersion.run
method in Stored Procedures.
Modeling bug fixes:
Fix an issue with passing categories to
OneHotEncoder
andOrdinalEncoder
as a dictionary or as a pandas DataFrame.
New features¶
New Model Registry features:
Model Registry now supports timestamp (TIMESTAMP_NTZ) columns in input and output data.
New modeling features:
OneHotEncoder
andOrdinalEncoder
now support a list of array-like values for thecategories
argument.
New Dataset features:
DatasetVersion
instances now havelabel_cols
andexclude_cols
properties.
Version 1.5.2 (2024-06-10)¶
Bug fixes¶
Model Registry bug fixes:
Fixed an issue that prevented calls to
log_model
in a stored procedure.
Modeling bug fixes:
Quick fix for
import snowflake.ml.modeling.parameters.enable_anonymous_sproc
not working due to package dependency error.
Version 1.5.1 (2024-05-22)¶
New features¶
New Model Registry features:
log_model
,get_model
, anddelete_model
methods now support fully-qualified names.
New modeling features:
You can now use an anonymous stored procedure during fitting, so that modeling does not require privileges to operate on the registry schema. Call
import snowflake.ml.modeling.parameters.enable_anonymous_sproc
to enable this feature.
Bug fixes¶
Model registry bug fixes:
Fix issue with loading older models.
Version 1.5.0 (2024-05-01)¶
Behavior changes¶
Model Registry behavior changes:
The
fit_transform
method can now return either a Snowpark DataFrame or a pandas DataFrame, matching the kind of DataFrame passed to the method.
New features¶
New Model Registry features:
Added support for exporting models from the registry (
ModelVersion.export
).Added support for loading the underlying model object (
ModelVersion.load
).Added support for renaming models (
Model.rename
).
Bug fixes¶
Model Registry bug fixes:
Fixed the “invalid parameter
SHOW_MODEL_DETAILS_IN_SHOW_VERSIONS_IN_MODEL
” error.
Version 1.4.1 (2024-04-18)¶
New features¶
New Model Registry features:
Added support for catboost models (
catboost.CatBoostClassifier
,catboost.CatBoostRegressor
).Added support for lightgbm models (
lightgbm.Booster
,lightgbm.LightGBMClassifier
,lightgbm.LightGBMRegressor
).
Bug fixes¶
Model Registry bug fixes:
Fixed bug that caused
relax_version
option to not work.
Version 1.4.0 (2024-04-08)¶
Behavior changes¶
Model Registry behavior changes:
The
apply
method is no longer included as a target method by default when logging an XGBoost model. If you need this method available in logged models, included it manually in thetarget-methods
option:log_model(..., options={"target_methods": ["apply", ...]})
New features¶
New model registry features:
The registry now supports logging sentence transformer models (
sentence_transformers.SentenceTransformer
).The
version_name
argument is no longer required when logging a model. A random human-readable ID is generated if none is provided.
Bug fixes¶
Model registry bug fixes:
Fix issue where, when multiple models are called in the same query, models after the first returned incorrect results. This fix is applied when models are logged and does not benefit existing models; you must log your models again to correct this behavior.
Modeling bug fixes:
Fix bug in registering a model where only methods mentioned in
save_model
were added to the model signature for Snowpark ML models.Fix bug in batch inference methods such as such as
predict
andpredict_log_probe
where, whenn_jobs
was not 1, the methods would not be executed.Fix bug in batch inference methods where they could not infer datatypes when the first row of data contained NULL.
The output column names from distributed hyperparameter optimization are now correctly matched with the Snowflake identifier.
Relaxed the versions of dependencies of distributed hyperparameter optimization methods; these were too strict and caused these methods to fail.
scikit-learn is now listed as a dependency of the LightGBM package.
Version 1.3.1 (2024-03-21)¶
New features¶
FileSet/FileSystem updates:
snowflake.ml.fileset.sfcfs.SFFileSystem
can now be used in UDFs and stored procedures.
Version 1.3.0 (2024-03-12)¶
Behavior changes¶
Model registry behavior changes:
As previously announced, the default for the
relax_version
option (in theoptions
argument oflog_model
) is nowTrue
, allowing more reliable deployment in most cases by permitting dependency versions available in Snowflake.When running model methods, value range based input validation (which prevents input from overflowing) is now optional. This should improve performance and should not lead to issues for most types of models. To enable validation, pass the named argument
strict_input_validation=True
when calling the model’srun
method.
Model development behavior changes:
The
fit_predict
method now returns either a pandas or a Snowpark DataFrame, depending on the type of the input data, and is available on all classes where it is available in the underlying scikit-learn, xgboost, or lightgbm class.
New features and updates¶
FileSet/FileSystem updates:
Instances of
snowflake.ml.fileset.sfcfs.SFFileSystem
can now be serialized withpickle
.
Bug fixes¶
Model registry bug fixes:
Fix a problem with importing
log_model
in some circumstances.Fix an incorrect error message when validating input Snowpark DataFrame with an array feature.
Model development bug fixes:
Relax package versions for all inference methods when the installed version of a dependency is not available in the Snowflake conda channel.
Version 1.2.3 (2024-02-26)¶
New features and updates¶
Model development updates:
All modeling classes now include a
score_samples
method to calculate the log-likelihood of the given samples.
Model registry updates:
Decimal type features are automatically cast (with a warning) to a DOUBLE or FLOAT instead of producing an error.
Improve error message for currently-unsupported
pip-requirements
option.You can now delete a version of a model.
Bug fixes¶
Model development fixes:
precision_recall_fscore_support
returned incorrect results withaverage="samples"
.
Model registry fixes:
Descriptions, models, and tags were not retrieved correctly in newly-created registries under the private preview model registry API due to a recent Snowflake behavior change.
Version 1.2.2 (2024-02-13)¶
New features and updates¶
Model registry updates:
You can now specify external access integrations when deploying a model to Snowpark Container Services using the private preview registry API, allowing models to access the internet to retrieve dependencies during deployment. The following endpoints are required for all deployments:
docker.com:80
docker.com:443
anaconda.com:80
anaconda.com:443
anaconda.org:80
anaconda.org:443
pypi.org:80
pypi.org:443
For models derived from
HuggingFacePipeLineModel
, the following endpoints are required.huggingface.com:80
huggingface.com:443
huggingface.co:80
huggingface.co:443
Version 1.2.1 (2024-01-25)¶
New features and updates¶
Model development updates:
Infer column data type for transformers when possible.
Model registry updates:
relax_version
option (inoptions
argument oflog_model
) relaxes dependencies of stated versions to allow newer minor versions when set toTrue
.
Version 1.2.0 (2024-01-12)¶
New features and updates¶
Public preview release of model registry. See Snowflake Model Registry. The previous private preview release of the model registry has been deprecated, but will continue to be supported while it includes features not yet available in the public preview version.
Model development updates:
Added support for
fit_predict
method in AgglomerativeClustering, DBSCAN, and OPTICS classes.Added support for
fit_transform
method in MDS, SpectralEmbedding and TSNE class.