Interoperability with third party libraries¶
Many third party libraries are interoperable with pandas, for example by accepting pandas dataframes objects as function inputs. Here we have a non-exhaustive list of third party library use cases with pandas and note whether each method works in Snowpark pandas as well.
Snowpark pandas supports the dataframe interchange protocol, which some libraries use to interoperate with Snowpark pandas to the same level of support as pandas.
plotly.express¶
For each of the following methods in the plotly.express
module, we validate that passing in Snowpark pandas
dataframes or series as the data inputs behaves equivalently to passing in pandas dataframes or series.
Note
Currently only plotly versions <6.0.0 are supported through the dataframe interchange protocol.
Method name |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
scikit-learn¶
We break down scikit-learn interoperability by categories of scikit-learn operations.
For each category, we provide scikit-learn operations that may include multiple method calls. For each of these methods, we validate that passing in Snowpark pandas objects behaves equivalently to passing in pandas objects.
Note
While some scikit-learn methods accept Snowpark pandas inputs, their
performance with Snowpark pandas inputs is often much worse than their
performance with native pandas inputs. Generally we recommend converting
Snowpark pandas inputs to pandas with to_pandas()
before passing them
to scikit-learn.
Classification¶
Operation |
Fitting a |
Regression¶
Operation |
Fitting a |
Clustering¶
Clustering method |
|
Dimensionality reduction¶
Operation |
Getting the principal components of a
numerical dataset with |
Model selection¶
Operation |
Choosing parameters for a
|
Note
RandomizedSearchCV
causes Snowpark pandas to issue many queries. We strongly
recommend converting Snowpark pandas inputs to pandas before using RandomizedSearchCV
.
Preprocessing¶
Operation |
Scaling training data with
|