Snowflake Notebooks in Workspaces

Overview

A Snowflake notebook in Workspaces is fully-managed and built for end-to-end data science and machine learning development on Snowflake data. This new environment for notebooks includes:

  • Integrated Development Environment (IDE) features - Includes full IDE capabilities for streamlined file management and editing to improve your workflow.

  • Familiar Jupyter experience - Use a standard Jupyter Python notebook environment that connects directly to your Snowflake data while maintaining all governance controls.

  • Optimized for AI/ML workloads - Notebooks in Workspaces runs in a preconfigured container designed for scalable AI/ML development and includes fully-managed access to CPUs and GPUs, parallel data loading, and distributed training APIs for popular ML packages (for example, XGBoost, PyTorch, or LightGBM).

  • Governed collaboration - Supports simultaneous multi-user collaboration with built-in governance. Track all changes and maintain a complete history using Git or shared workspaces.

Benefits for machine learning (ML) workflows

Notebooks in Workspaces provides two primary capabilities for ML workflows.

  • End-to-end workflow - The platform enables users to consolidate their complete ML lifecycle, from source data access to model inference, within a single Jupyter notebook environment. This environment is integrated with the underlying data platform, allowing it to inherit existing governance and security controls for the data and code assets.

  • Scalable model development architecture - The architecture supports the development of scalable models by providing open-source software (OSS) model development capabilities. Users can access distributed data loading and training across designated CPU or GPU compute pools. This design simplifies ML infrastructure management by abstracting the need for manual configuration of distributed compute resources.

Governance and access control

To enable users to create and run Notebooks in Workspaces, which run exclusively on the Container Runtime (a service powered by Snowpark Container Services), specific privileges are required:

Action

Required Privilege

Granted On

Notes

Creation of notebook files

OWNERSHIP

The workspace

Enables users to create .ipynb files within the workspace environment.

Execution of notebooks

USAGE

The underlying compute pool

Required for the notebook’s execution service to run. Compute pools provide the containerized compute for the notebook’s kernel.

By default, the PUBLIC role is granted USAGE on the system-provided compute pools (SYSTEM_COMPUTE_POOL_CPU, SYSTEM_COMPUTE_POOL_GPU).

Notebook service management

When a user runs a notebook, a Snowflake-managed notebook service is dynamically created on a compute pool to host the Python kernel and facilitate execution. These services are personal to each user, can only be used to run notebooks, and are located within the user’s Personal Database (PDB).

Administrator control and cost monitoring

Administrators manage user access and monitor costs primarily through the associated compute pools.

  • Disable notebook execution: Administrators can disable the ability for users to run Notebooks in Workspaces by removing the USAGE privilege on compute pools for a user’s roles.

  • Drop services: Administrators can drop a notebook service via SQL:

    DROP USER$DB_NAME.PUBLIC.[SERVICE_NAME];
    
    Copy

    Alternatively, administrators can use Snowsight:

    1. Sign in to Snowsight.

    2. In the navigation menu, select Monitoring » Services & jobs.

    3. Select the ellipsis More options, then select Drop.

Notebooks in Workspaces features

The table below outlines the core features of Snowflake Notebooks in Workspaces and the purpose or benefit each feature provides. The new Notebooks experience offers enhanced performance, improved developer productivity, and Jupyter compatibility.

Integration with Workspaces

Feature

Description

Notebooks are files in Workspaces

The Workspaces environment supports easy file management, allowing you to iterate on individual notebooks and project files. Create folders, upload files, and organize notebooks. Notebook files open in tabs in your workspace and are editable and executable.

Git Workspaces integration

Collaboration is streamlined by maintaining a single source of truth compatible with different development environments. Connect to a Git repo by creating a new workspace and selecting Workspaces » From Git repository. You can pull in your files, create and switch branches, and push changes back with diff resolution.

Updates to compute and cost management

Feature

Description

Snowpark Container Services compute pools

Optimizes cost and compute power by allowing users to select the exact CPU/GPU resources needed for the workload. For more details, see Notebook usage and cost monitoring. Access to CPU/GPU machine types.

Shared container service connection

Reduces notebook start-up time and improves resource utilization. After the first notebook connects to a container service, other notebooks can quickly connect to the same container service and share the compute resources of a single compute pool node. Each notebook still maintains its own virtual environment.

Background kernel persistence

Ensures uninterrupted execution of critical, long-running processes like ML training and data engineering jobs. Notebook kernels run until idle timeout, independent of frontend or client connection status.

Simplified idle time configuration

Simplifies cost management by preventing unused compute resources from running indefinitely. Idle time is configured on the container service, automatically shutting down the service after a defined period of inactivity.

Service-level EAI management

EAIs are configured once on the container service and apply to all notebooks in the same workspace. It’s no longer necessary to manually configure EAIs for each individual notebook.

Jupyter compatibility

Feature

Description

Jupyter magic commands

Provides a familiar development experience by leveraging standard Jupyter utilities and productivity features such as cell and line magics. Use %lsmagic to check built-in magics.

Package management

Feature

Description

Pre-installed data science (DS) and ML packages

Provides a flexible and streamlined environment for immediate development without complex initial package installation. Popular packages are pre-installed in the Snowflake Runtime and can be directly imported.

Install packages via requirements.txt

Specify and install required package versions using !pip install -r requirements.txt to ensure a consistent environment setup.

Note

If the package version specified in requirements.txt is outside the supported version range on the pre-installed packages, the Python environment may be broken.

Install packages from PyPI or other repos

Download packages directly using !pip install after configuring EAIs for secure repo endpoints. Users have access to a vast external ecosystem of packages beyond the pre-installed runtime, ensuring secure connectivity to external repos.

Install packages from Workspaces file upload

Download or build .whl or .py files, upload them to your workspace, and install using !pip install file_name.whl. Users have the option to install custom, proprietary, or specific package builds that are not available through public repositories.

Import from workspace

Import modules from .py files that live in your workspace and share utilities and functions across notebooks. For example:

from my_utils import my_func
Copy

Import from stage

Enables secure and governed package deployment by leveraging existing Snowflake data storage and governance controls for package files. Use the Snowpark session to retrieve package files from a Snowflake stage into the container environment for import and use. For example:

from snowflake.snowpark import Session
import sys
session = Session.builder.getOrCreate()
session.file.get("@stage_name/math_tools.py","/tmp/")
sys.path.append("tmp/")
import math_tools
math_tools.add_one(3)
Copy

Updates to notebook editing

Feature

Description

Bidirectional SQL <> Python cell referencing

Optimizes developer productivity by allowing seamless language switching and direct reuse of results and variables across cells. SQL results can be directly referenced as pandas DataFrames (for example, dataframe_x). Python variables, including DataFrames can be referenced in SQL queries (for example, {{variable}}).

Interactive datagrid and automated chart builder

You can view, search, filter, and sort results on millions of records and generate charts without code. Provides a high-performance, consistent user experience with data manipulation and visualization across Workspaces editing surfaces.

Enhanced minimap and cell status

The minimap improves notebook organization and assists with debugging and navigation through clear section outlines and execution status tracking. A table of contents is generated from Markdown headers and displays a comprehensive, in-session status for each cell (running, succeeded, failed, and modified).

Use comments to name code cells

The first line comment in a Python or SQL cell is used as the cell’s name in the minimap, simplifying navigation and provides contextual labeling for cells within large notebooks.

Classic Snowflake Notebooks vs. Notebooks in Workspaces

Feature

Replacement

Warehouse Runtime no longer supported

Notebooks in Workspaces exclusively run on Container Runtime. Container Runtime provides a simplified user experience with the benefits of CPU/GPU compute.

streamlit package no longer supported

Use matplotlib, seaborn, and other visualization packages instead. plotly and altair visualization packages are not yet supported. For details, see the Limitations.

Anaconda packages no longer supported

Use the pre-installed Container Runtime packages, use EAIs to install more packages including PyPI, or use .whl, .py, or .zip files that you’ve uploaded to a stage or directly to Workspaces.

The .to_df() syntax

It isn’t necessary to manually convert a SQL result to a DataFrame. Use dataframe_x (shown for each SQL cell) directly as a pandas, DataFrame in Python code. You can still convert a Snowpark DataFrame to a pandas DataFrame by calling pandas_df = snowpark_df.to_pandas().

Need to adjust context

Notebooks are now in Workspaces, and you may need to explicitly set your role and warehouse using the dropdown in the upper-right corner of the Workspaces editor if you do not have defaults set. You also must set the context in a cell to query your data assets using the following SQL commands:

USE DATABASE <database_name>;
USE SCHEMA <schema_name>;
Copy

You can also query using fully qualified names, for example:

SELECT * FROM TABLE <database_name.schema_name.table_name>
Copy

Cell names

Cell names are temporarily unavailable. To name a code cell, use a comment in the first line of the cell (# for Python and -- for SQL). When you import an existing .ipynb file with cell names, the cell names will be used as the attached DataFrame name for SQL cells.

Limitations

  • Renaming your notebook file, other files, folders, or the workspace may cause unexpected behaviors such as getting disconnected from the service, clearing the notebook’s output cache, or delays in updating the referenced files. Try reconnecting your notebooks if you get disconnected. If you renamed the workspace, try creating and using a new service.

  • The current account limit is 200 active services. Notebooks in different workspaces cannot share the same service. By default, notebooks in the same workspace connect to the same service. However, users can also create more than one service per workspace and connect different notebooks to different services.

  • Notebook services may be restarted over the weekend for container service maintenance. Afterwards, you must rerun your notebooks and reinstall packages to restore variables and packages.

  • Sharing notebooks to different roles is not yet supported. Use Git-backed Workspaces to sync changes to your Git repo for collaboration.

  • Folders and files created from the command line (for example, !mkdir, df.to_csv("my_table.csv", index=False)) are only available on the same service until the service is suspended. Files written to the Workspace directory (starting with /workspace) do not appear in the Workspaces File Explorer (left pane in the Workspaces UI) or persist if the notebook is connected to a different service. To ensure file writebacks are persisted, save the files using the Snowpark file operation APIs to a Snowflake stage with write access.

  • iPywidgets are not yet supported.

  • To embed an image in your notebook, upload it to your workspace and then display it using a Python cell. For example:

    from IPython.display import Image, display
    display(Image(filename="path/to/example_image.png"))
    
    Copy

    Embedding images in Markdown cells and using remote images via URLs is not yet supported. For a cleaner presentation, we recommend collapsing code cells to show only the output results.

  • Visualization packages that rely on HTML rendering (such as plotly and altair) are not yet supported.

  • Custom container images and the artifact repository are not yet available for use with Notebooks in Workspaces.

  • Snowflake supports downloading packages using uv pip install. However, uv pip freeze will only list packages installed this way. To see a complete list of packages, including those in the base image and those installed using the standard pip install, use the pip freeze command.

  • Enabling secondary roles is no longer required to use personal workspaces. If a user doesn’t have secondary roles set to ALL, they need to select a role that has OWNERSHIP or USAGE privileges on the compute pools and EAIs to create a service. However, if your account has session policies that prevent the use of secondary roles, you will not be able to use Notebooks within personal workspaces. Other personal workspace features, such as SQL files and Git integration, will still be available.

Ask your account representative to contact the Notebooks product team if you have any questions about when specific features will be available.