Writing Snowpark Code in Python Worksheets

Write Snowpark code in Python worksheets to process data using Snowpark Python in Snowsight. By writing code in Python worksheets, you can perform your development and testing in Snowflake without needing to install dependent libraries.

To develop with Python worksheets, do the following:

  1. Prepare roles and packages in Snowflake.

  2. Set up your worksheet for development.

  3. Write Snowpark code in your Python worksheet.

  4. Run your Python worksheet.

For example, you might write code in a Python worksheet that extracts data from stages or database objects in Snowflake, transforms the data, and stores the transformed data in Snowflake. You could then convert that code to a stored procedure and build a data pipeline, all without leaving Snowflake.

About Python Worksheets

Python worksheets let you use Snowpark Python in Snowsight to perform data manipulations and transformations. You can use packages available in Anaconda or import your own Python files from stages to use in scripts.

After running a Python worksheet, review the results and output returned by your script. The results display as a string, variant, or a table, depending on your code. Refer to Running Python Worksheets.

Note

Because Python worksheets run inside Snowflake rather than in your local development environment, you cannot use session.add_import to add a file that your Python code depends on, or session.add_packages or session.add_requirements to add packages that you need to use in your Python code. Instead, you add those files to a stage and reference them in your code. Refer to Add a Python File from a Stage to a Worksheet.

Python worksheets have the following limitations:

  • Log levels lower than WARN do not appear in the Output for a Python worksheet by default. Instead, use a logging library such as the logging module to set the level of messages logged. All messages that you log appear in the output.

  • No support for breakpoints or running only portions of the Python code in a worksheet.

  • No support for images or webpages. Images or webpages generated by Python code cannot be displayed in Python worksheets.

  • Snowpark Python uses Python 3.8.

If you require support for any of these options, consider using your local development environment instead. Refer to Setting Up Your Development Environment for Snowpark Python.

Prerequisites for Python Worksheets

To use Python worksheets, you must do the following:

Add a Python File from a Stage to a Worksheet

Snowflake includes the Anaconda packages from the Snowflake conda channel in Python worksheets. If you want to use other Python files or packages than those included in Anaconda in your Python worksheet, you must upload the files to a named stage in Snowflake and then add them to the list of packages for your Python worksheet.

To use a Python package in your worksheet that is not included by default, do the following:

  1. Sign in to Snowsight.

  2. Open Worksheets.

  3. Select + » Python Worksheet.

  4. Select a database and schema.

  5. Select Packages » Stage Packages.

  6. Enter the path to the package in the stage.

    • If the selected database and schema for the worksheet contain the stage where the package is located, you can reference the stage using an unqualified name. For example, @YourStage/path/to/package.py.

    • To reference a stage in a different database and schema, fully qualify the name of the stage. For example, @Database.Schema.Stage/path/to/package.py.

  7. Select Import to add your package to the list of installed packages. You can then use import statements to use the package in your Python worksheet.

Note

Packages that you add to a worksheet are available only to that worksheet. If you want to use the same package in a different Python worksheet, follow these steps to add the package to that worksheet.

For more details, refer to Making Dependencies Available to Your Code.

Start Developing with Python Worksheets

To open a worksheet and configure your development environment, do the following:

  1. Sign in to Snowsight.

  2. Open Worksheets.

  3. Select + » Python Worksheet.

  4. Select a database and schema.

  5. Select a warehouse to use to run the worksheet. If you have a default warehouse for your user, it is pre-selected.

    Python worksheets require a running warehouse to load Python packages and run Python code.

  6. (Optional) Select Packages to add Python libraries.

    • Packages included with Anaconda, such as numpy, pandas, requests, and urllib3, are already installed.

    • Search for other packages that you want to install and use in your worksheet, such as scikit-learn. Select the package in the search results and optionally modify the package version selected by default. Packages installed by you appear at the top of the list of packages.

    • Add your own packages and Python files by selecting Stage Packages and specifying the file path of the stage and package, then selecting Import.

    If you add Python libraries to your worksheet, update your code to use import statements to import the libraries into your worksheet.

  7. Run the sample Python code to validate your configuration.

Error messages or the return value from your code appears in the Results section. To view log messages, select Output. Refer to Running Python Worksheets.

Writing Snowpark Code in Python Worksheets

After you follow the steps to start developing with Python worksheets, you can replace the sample code with your own.

Write your Snowpark Python code inside the handler function:

import snowflake.snowpark as snowpark

def main(session: snowpark.Session):
    # your code goes here
Copy

The default handler function is main, but you can change it in the Settings for the worksheet.

Use the session object provided in the boilerplate code to access data in Snowflake with the Snowpark API libraries. For example, you can create a DataFrame for a table or execute a SQL statement. Refer to the Snowpark Developer Guide for Python.

As you type, you see autocomplete for Python methods, defined variables, database objects, and more. You do not see autocomplete for some third-party packages or files imported from a stage. Python worksheets also include syntax highlighting and guidance for method parameters. You can enable linting and line wrapping in the Settings for the worksheet.

Return Results of a Different Data Type

When you write your Python code, consider which type of data is returned by the return statement in your code and adjust how the worksheet returns results. By default, a Python worksheet has a return type of Table() because the placeholder code returns a DataFrame.

Depending on what your Python code returns, you might want to change the worksheet settings to display the output differently:

  • If your handler function returns a DataFrame, use the default return type of Table().

  • If your handler function returns a list of Row objects, such as with the collect method, change the return type to Variant.

  • If your handler function returns a string, such as return "Hello Python", or a value that you want to cast as a string, change the return type to String.

  • If your handler function returns an integer, such as with the count method, use a return type of Variant or String.

For details about the return type of some DataFrame methods, refer to Performing an Action to Evaluate a DataFrame.

To update the worksheet settings to return results of a different type, do the following:

  1. Sign in to Snowsight.

  2. Open Worksheets.

  3. Open the Python worksheet for which you want to display the results as a table.

  4. Select a warehouse to use to run the worksheet. If you have a default warehouse for your user, it is pre-selected. Make sure your warehouse is running.

  5. Select Settings and for the Return Type, select the type returned by the handler function.

  6. Run your Python worksheet.

  7. Review the results in the Results panel.

Running Python Worksheets

After you write your Python worksheet, select Run to run your Python worksheet. Running your worksheet executes all of the code in your Python worksheet. Partial or incremental execution of code is not supported.

Review Output Generated by Your Code

You can review standard output (stdout) or standard error (stderr) messages for your Python code in the Output panel for a Python worksheet.

You can see the output from the following types of functions in the Output panel:

  • Functions that write to the console, such as print().

  • Functions that print a DataFrame, such as the show method of the DataFrame class in Snowpark Python.

Note

Output appears after all Python processes finish running, rather than appearing in a stream as the code runs.

Log output is written to a temporary stage and is only captured if the following are true:

  • You select a database and schema for the worksheet.

  • The selected database was not created from a share.

  • You run the worksheet using a role that has USAGE privileges on the selected database and schema.

Review the Query History for a Python Worksheet

When a Python worksheet runs in Snowsight, an anonymous stored procedure runs the code and generates queries that execute the Snowpark commands in the code.

You can use the Query History page in Snowsight to review the queries that ran. Refer to Query History.

For example, after running a worksheet, you can review the queries that ran by doing the following:

  1. Review the Results of the worksheet.

  2. In the Query Details for the worksheet, select More options » Copy Query ID

  3. Select Worksheets to return to the list of worksheets.

  4. Select Activity » Query History.

  5. On the Query History page, display only the queries from your Python worksheet:

    1. Select Filters, and enable the Query ID option.

    2. Enter the Query ID of your Python worksheet.

    3. Select Apply Filters.

  6. Review the queries run for the worksheet.