Run Notebooks on Snowpark Container Services

Overview

You can run Snowflake Notebooks on Snowpark Container Services through Container Runtime. Snowpark Container Services gives you a flexible container infrastructure that supports building and operationalizing a wide variety of workflows entirely within Snowflake. Container Runtime provides software and hardware options to support advanced data science and machine learning workloads on Snowpark Container Services. Compared to Virtual warehouses, Container Runtime provides a more flexible compute environment where you can install packages from multiple sources and select compute resources, including GPU machine types, while still running SQL queries on warehouses for optimal performance.

This document describes some considerations for using notebooks on Snowpark Container Services and how to set up your notebook to use Container Runtime.

Considerations for running Notebooks on Snowpark Container Services

Known limitations for preview

  • When using the Snowpark ML fit method in your notebook, always make sure the data is backed by permanent tables.

  • As noted, only one notebook can run on one compute node. During preview, no error message is given if you try to start your notebook in a compute pool with not enough nodes available. If you have trouble starting up your notebook, try selecting a different compute pool.

Prerequisites

Before you start using Snowflake Notebooks on Snowpark Container Services, the ACCOUNTADMIN role must complete the following:

In addition to these tasks, if you want to be able to install packages from internet repositories such as PyPi and Hugging Face, the ACCOUNTADMIN must set up external network access and grant your role USAGE privileges.

Set up external access integration (EAI) access

With external access integration (EAI), you can enable secure access to specific network locations external to Snowflake, and then use that access from within the handler code for user-defined functions (UDFs) and stored procedures.

Note

This must be executed using the ACCOUNTADMIN role.

The following examples show how to set up this access for sites that host popular ML libraries:

Create an external access integration for PyPI:

CREATE OR REPLACE NETWORK RULE pypi_network_rule
MODE = EGRESS
TYPE = HOST_PORT
VALUE_LIST = ('pypi.org', 'pypi.python.org', 'pythonhosted.org',  'files.pythonhosted.org');

CREATE EXTERNAL ACCESS INTEGRATION pypi_access_integration
ALLOWED_NETWORK_RULES = (pypi_network_rule)
ENABLED = true;
Copy

Create an external access integration for HuggingFace:

CREATE OR REPLACE NETWORK RULE hf_network_rule
MODE = EGRESS
TYPE = HOST_PORT
VALUE_LIST = ('huggingface.co', 'cdn-lfs.huggingface.co');

CREATE EXTERNAL ACCESS INTEGRATION hf_access_integration
ALLOWED_NETWORK_RULES = (hf_network_rule)
ENABLED = true;
Copy

Allow all network access with one external access integration:

CREATE OR REPLACE NETWORK RULE allow_all_rule
MODE= 'EGRESS'
TYPE = 'HOST_PORT'
VALUE_LIST = ('0.0.0.0:443','0.0.0.0:80');

CREATE EXTERNAL ACCESS INTEGRATION allow_all_integration
ALLOWED_NETWORK_RULES = (allow_all_rule)
ENABLED = true;
Copy

You can grant USAGE on the integrations with the following commands:

GRANT USAGE ON INTEGRATION pypi_access_integration TO ROLE <role_name>;
GRANT USAGE ON INTEGRATION hf_access_integration TO ROLE <role_name>;
GRANT USAGE ON INTEGRATION allow_all_access_integration TO ROLE <role_name>;
Copy

Note

You must grant USAGE on a specific role that creates the notebooks. USAGE granted to the PUBLIC role will not work.

For detailed syntax, see external network access.

Enable EAI for notebooks

After your ACCOUNTADMIN has created external access integrations (EAI) and granted your role USAGE on them, you must enable EAI on your notebook. This allows you to install packages from external repositories.

Note

You must run this command before you start the notebook session or run any cells.

To enable EAI for your notebook, create a new SQL worksheet and execute the following ALTER command.

ALTER NOTEBOOK <your_notebook_name> set EXTERNAL_ACCESS_INTEGRATIONS = (allow_all_access_integration);
Copy

Access requirements

Notebook access is role-based. The notebook role must be granted USAGE privileges on all the resources for the notebook, including the warehouse, compute pools, database and schema:

GRANT USAGE ON WAREHOUSE notebooks_wh TO ROLE <role_name>;
GRANT USAGE ON COMPUTE POOL notebook_comp_pool TO ROLE <role_name>;
GRANT USAGE ON DATABASE notebook_db TO ROLE <role_name>;
GRANT USAGE ON SCHEMA notebook_db.public TO ROLE <role_name>;
Copy

Note that the database and schema are only required for storing your notebooks. You can query any database and schema your role has access. Use the USE DATABASE or USE SCHEMA commands in a SQL cell to change the context to a different database or schema.

The notebook role must also have the CREATE NOTEBOOK and CREATE SERVICE privileges granted on the schema:

GRANT CREATE NOTEBOOK ON SCHEMA notebook_db.public TO ROLE <role_name>;
GRANT CREATE SERVICE ON SCHEMA notebook_db.public TO ROLE <role_name>;
Copy

Create a notebook on Snowpark Container Services

When you create a notebook on Snowpark Container Services, you choose a warehouse, runtime, and compute pool to provide the resources to run your notebook. The runtime you choose gives you access to different Python packages based on your use case, and different warehouse sizes or compute pools have different cost and performance implications. All of these settings can be changed later if needed.

To create a Snowflake Notebook to run on SPCS, follow these steps:

  1. Sign in to Snowsight.

  2. Select Notebooks.

  3. Select + Notebook.

  4. Enter a title for your notebook.

  5. (Optional) Change the selected warehouse to use to run SQL and Snowpark queries.

    For guidance on what size warehouse to use, see Warehouse recommendations for running Snowflake Notebooks.

  6. Select a database and schema in which to store your notebook. These cannot be changed after you create the notebook.

  7. Select the Run on container as your Python environment.

  8. Select the Runtime type: CPU or GPU.

  9. Select a Compute pool.

  10. To create and open your notebook, select Create.

Runtime:

This private preview provides only container runtimes. You can choose between CPU and GPU runtimes. Each runtime image contains a base set of Python packages and versions verified and integrated by Snowflake. All runtime images support data analysis, modeling, and training with Snowpark Python, Snowpark ML, and Streamlit.

For more details on the container image, see Container Runtime for ML.

Compute pool:

A compute pool provides the compute resources for your notebook kernel and Python code. Use smaller, CPU-based compute pools to get started, and select higher-memory, GPU-based compute pools to optimize for intensive GPU usage scenarios like computer vision or LLMs/VLMs.

Note that each compute node is limited to running one notebook at a time. For more details on Snowpark Container Services compute pools, see Snowpark Container Services: Working with compute pools.

You can also import and export .ipynb files. See Create Snowflake Notebooks from an existing file and Export your notebook as file for sharing.

Run a notebook on Snowpark Container Services

After you create your notebook, you can start running code immediately by adding and running cells. For information about adding cells, see Develop and run code in Snowflake Notebooks.

Importing more packages

In addition to pre-installed packages to get your notebook up and running, you can install packages from public sources that you have external access set up for, and use packages stored in a stage or a private repository. Before you start installing from external channels, use the ALTER NOTEBOOK statement to enable EAI on your notebook. For instructions, see Enable EAI for notebooks.

The following example installs an external package using pip install in a code cell:

!pip install transformers scipy ftfy accelerate
Copy

Update notebook settings

You can update settings, such as which compute pools or warehouse to use, any time in Notebook settings, which can be accessed through the more actions for worksheet Notebook actions menu.

Cost/billing considerations

Running Snowflake Notebooks on Snowpark Container Services incurs standard Snowpark Container Services compute costs. See Snowpark Container Services costs for details.

Snowflake Notebooks require a virtual warehouse to run SQL and Snowpark queries. So you might also incur virtual warehouse compute costs if you use SQL in SQL cells and Snowpark push-down queries executed in Python cells. The following diagram shows where compute happens for each type of cell.

Diagram showing the compute distribution of notebook cells.

For example, the following Python example uses the XGBRegressor library from Snowpark ML which pushes down compute to a warehouse:

from snowflake.ml.modeling.xgboost import XGBRegressor
regressor = XGBRegressor(
    input_cols=CATEGORICAL_COLUMNS_OE+NUMERICAL_COLUMNS,
    label_cols=LABEL_COLUMNS,
    output_cols=OUTPUT_COLUMNS
)
regressor.fit(train_df)
result = regressor.predict(test_df)
Copy

In the follwing example, another xgboost library is used. The data is pulled into the container and compute occurs on Snowpark Container Services:

import xgboost as xgb
regressor = xgb.XGBRegressor(tree_method="hist", device="cuda")
regressor.fit(X, y)
result = regressor.predict(test_df)
Copy

See Overview of warehouses to understand warehouse costs. See also Warehouse recommendations for running Snowflake Notebooks.