Experience Snowflake with notebooks

Snowflake Notebooks is a development surface that you can use with many other Snowflake features. This topic shows how to leverage other Snowflake capabilities within notebooks.

Snowpark Python in notebooks

The Snowpark library provides an intuitive API for querying and processing data in a data pipeline. Using the Snowpark library, you can build applications that process data in Snowflake without moving data to the system where your application code runs. You can also automate data transformation and processing by writing stored procedures and scheduling those procedures as tasks in Snowflake.

You can use Snowpark to query and process data at scale in Snowflake by writing Snowpark code in a Python cell of your notebook.

Example usage

Snowpark Python comes pre-installed with the Snowflake Notebooks environment. The following example uses the Snowpark library in a notebook to read in a CSV file and a Snowflake table and display its contents as output.

  1. In your notebook, add a Python cell, either using a keyboard shortcut or by selecting + Python. Snowflake Notebooks support the same Python version supported by Snowpark, Python 3.9.

  2. Set up a Snowpark session In notebooks, the session context variable is preconfigured. You can use the get_active_session method to get the session context variable:

    from snowflake.snowpark.context import get_active_session
    session = get_active_session()
    
    Copy
  3. Use Snowpark to load a CSV file into a Snowpark DataFrame from a stage location. This example uses a stage called tastybyte_stage.

    df = session.read.options({"infer_schema":True}).csv('@TASTYBYTE_STAGE/app_order.csv')
    
    Copy
  4. Load an existing Snowflake table, app_order, into the Snowpark DataFrame.

    df = session.table("APP_ORDER")
    
    Copy
  5. Display the Snowpark DataFrame.

    df
    
    Copy

Note

Outside of the Snowflake Notebooks environment, you need to call df.show() to print out the DataFrame. In Snowflake Notebooks, DataFrames are evaluated eagerly when df is printed out. The DataFrame is printed out as an interactive Streamlit DataFrame display (st.dataframe). For DataFrames over 10000 rows, the maximum number of rows displayed is 10000 rows.

Snowpark limitations

  • A Snowflake Notebook creates a Snowpark session, so you can use most of the methods available in a Snowpark Session class. However, because a notebook runs inside Snowflake rather than in your local development environment, you cannot use the following methods:

    • session.add_import

    • session.add_packages

    • session.add_requirements

  • Some Snowpark Python operations don’t work with SPROCs. For a complete list of operations, see Limitations.

Tip

The following are links to notebooks with more examples of using Snowpark:

Note

These are only shown as examples, and following along with the example may require additional rights to third-party data, products, or services that are not owned or provided by Snowflake. Snowflake does not guarantee the accuracy of these examples.

Streamlit in notebooks

Streamlit is an open-source Python library that makes it easy to create and share custom web apps for machine learning and data science. You can build interactive data applications with Streamlit directly in your notebook. You don’t need to go to a separate terminal window to serve up your Streamlit app. You can test and develop your app directly in notebook. Streamlit comes preinstalled in notebooks, which means you can get started right away.

Example usage

Streamlit comes pre-installed with the Snowflake Notebooks environment. The example in this section creates an interactive data app using Streamlit.

  1. Import necessary libraries

    import streamlit as st
    import pandas as pd
    
    Copy
  2. First create some sample data for the app.

    species = ["setosa"] * 3 + ["versicolor"] * 3 + ["virginica"] * 3
    measurements = ["sepal_length", "sepal_width", "petal_length"] * 3
    values = [5.1, 3.5, 1.4, 6.2, 2.9, 4.3, 7.3, 3.0, 6.3]
    df = pd.DataFrame({"species": species,"measurement": measurements,"value": values})
    df
    
    Copy
  3. Set up your interactive slider from the Streamlit library.

    st.markdown("""# Interactive Filtering with Streamlit! :balloon:
                Values will automatically cascade down the notebook cells""")
    value = st.slider("Move the slider to change the filter value 👇", df.value.min(), df.value.max(), df.value.mean(), step = 0.3 )
    
    Copy
  4. Finally, display a filtered table based on the slider value.

    df[df["value"]>value].sort_values("value")
    
    Copy

You can interact with the app in real time from the notebook. See the filtered table change based on the value you set on the slider.

Tip

For the complete example, see the interactive data app section of the Visual Data Stories with Snowflake Notebooks notebook.

Streamlit support in notebooks

When you use the st.map or st.pydeck_chart Streamlit commands, Mapbox provides the map tiles when rendering map content. Mapbox is a third-party application and is subject to Snowflake’s External Offerings Terms.

Some Streamlit elements are not supported:

Snowflake Copilot in notebooks

Snowflake Copilot is an LLM-powered assistant that simplifies data analysis while maintaining robust data governance, and seamlessly integrates into your existing Snowflake workflow.

You can interact with Copilot in Snowflake Notebooks in Snowsight. Using the Copilot panel, you can enter a question, and Snowflake Copilot will reply with an answer. You can run suggested SQL queries in your notebook.

Example usage

Follow these steps to start using Snowflake Copilot in your notebook:

  1. Create a new notebook or open an existing notebook.

  2. Select Ask Copilot in the lower-right corner of the notebook. The Snowflake Copilot panel opens on the right side of the notebook.

  3. Make sure a database and a schema are selected for the current notebook. If not, you can select them by using the selector below the Snowflake Copilot message box.

  4. In the message box, type in your question and then select the send icon or press Enter to submit it. Snowflake Copilot provides a response in the panel.

  5. If the response from Snowflake Copilot includes SQL statements:

    • Select Run to run the query. This adds the query to your notebook and runs it.

    • Select Add to edit the query before running it. This adds the query to your notebook.

To learn more about Snowflake Copilot, see Using Snowflake Copilot.

Snowpark ML in notebooks

Snowpark ML is the Python library that provides the APIs for Snowflake ML and for custom machine learning model development in Snowflake. Using Snowpark ML, you can develop custom models using APIs based on popular ML frameworks, define automatically-updated features to train them, and store them in a model registry for easy discovery and reuse.

Important

The snowflake-ml-python package and its dependencies must be allowed by your organization’s package policy.

Example usage

To use Snowpark ML, install the snowflake-ml-python library for your notebook:

  1. From the notebook, select Packages.

  2. Locate the snowflake-ml-python library and select the library to install it.

Here is an example of how you can use the Snowpark ML library for preprocessing your data:

import snowflake.ml.modeling.preprocessing as pp

# Initialize a StandardScaler object with input and output column names
scaler = pp.StandardScaler(
    input_cols=feature_names_input,
    output_cols=feature_names_input
)

# Fit the scaler to the dataset
scaler.fit(upsampled_data)

# Transform the dataset using the fitted scaler
scaled_features = scaler.transform(upsampled_data)
scaled_features
Copy

Here is an example of how you can use the Snowpark ML library for model training and inference:

from snowflake.ml.modeling.ensemble import RandomForestClassifier

# Initialize a RandomForestClassifier object with input, label, and output column names
model = RandomForestClassifier(
    input_cols=feature_names_input,
    label_cols=label,
    output_cols=output_label,
)

# Train the RandomForestClassifier model using the training set
model.fit(training)

# Predict the target variable for the testing set using the trained model
results = model.predict(testing)
Copy

Tip

For more examples of using Snowpark ML, see the following notebooks:

ML Registry in notebooks

Snowflake Model Registry allows customers to securely manage models and their metadata in Snowflake, regardless of origin. The model registry stores machine learning models as first-class schema-level objects in Snowflake so they can easily be found and used by others in your organization. You can create registries, and store models in them, using classes in the Snowpark ML library. Models can have multiple versions, and you can designate a version as the default.

Example usage

To use the Snowflake ML registry, install the snowflake-ml-python library for your notebook:

  1. From your notebook, select Packages at the top.

  2. Search for the snowflake-ml-python package and select the library to install it.

Here is an example of how you can use the Snowflake ML Registry to log a model:

from snowflake.ml.registry import Registry
# Create a registry and log the model
native_registry = Registry(session=session, database_name=db, schema_name=schema)

# Let's first log the very first model we trained
model_ver = native_registry.log_model(
    model_name=model_name,
    version_name='V0',
    model=regressor,
    sample_input_data=X, # to provide the feature schema
)

# Add evaluation metric
model_ver.set_metric(metric_name="mean_abs_pct_err", value=mape)

# Add a description
model_ver.comment = "This is the first iteration of our Diamonds Price Prediction model. It is used for demo purposes."

# Show Models
native_registry.get_model(model_name).show_versions()
Copy

Tip

This video shows an end-to-end example of how you can use Snowflake ML Registry.

pandas on Snowflake in notebooks

pandas on Snowflake lets you run your pandas code in a distributed manner directly on your data in Snowflake. Just by changing the import statement and a few lines of code, you can get the same pandas-native experience you know and love with the scalability and security benefits of Snowflake.

With pandas on Snowflake, you can work with much larger datasets and avoid the time and expense of porting your pandas pipelines to other big data frameworks or provisioning large and expensive machines. It runs workloads natively in Snowflake through transpilation to SQL, enabling it to take advantage of parallelization and the data governance and security benefits of Snowflake.

pandas on Snowflake is delivered through the Snowpark pandas API as part of the Snowpark Python library, which enables scalable data processing of Python code within the Snowflake platform.

Example usage

Snowpark pandas is available in Snowpark Python version 1.17 and above. Snowpark Python comes pre-installed with the Snowflake Notebooks environment.

In addition, users will need to install Modin by selecting modin` from Packages.

In a Python cell, import Snowpark Python and Modin:

import modin.pandas as pd
import snowflake.snowpark.modin.plugin
Copy
  1. Create a Snowpark session:

    from snowflake.snowpark.context import get_active_session
    session = get_active_session()
    
    Copy
  2. Start using the Snowpark Python API:

    # Create a Snowpark Pandas DataFrame with sample data.
    df = pd.DataFrame([[1, 'Big Bear', 8],[2, 'Big Bear', 10],[3, 'Big Bear', None],
                        [1, 'Tahoe', 3],[2, 'Tahoe', None],[3, 'Tahoe', 13],
                        [1, 'Whistler', None],['Friday', 'Whistler', 40],[3, 'Whistler', 25]],
                        columns=["DAY", "LOCATION", "SNOWFALL"])
    # Drop rows with null values.
    df.dropna()
    # Compute the average daily snowfall across locations.
    df.groupby("LOCATION").mean()["SNOWFALL"]
    
    Copy

Tip

For a more detailed example of how to use the pandas on Snowflake API, see Getting Started with pandas on Snowflake.