Writing Snowpark Code in Python Worksheets

Write Snowpark code in Python worksheets to process data using Snowpark Python in Snowsight. By writing code in Python worksheets, you can perform your development and testing in Snowflake without needing to install dependent libraries.

To develop with Python worksheets, do the following:

  1. Prepare roles and packages in Snowflake.

  2. Set up your worksheet for development.

  3. Write Snowpark code in your Python worksheet.

  4. Run your Python worksheet.

For example, you might write code in a Python worksheet that extracts data from stages or database objects in Snowflake, transforms the data, and stores the transformed data in Snowflake. You could then deploy that code as a stored procedure and build a data pipeline, all without leaving Snowflake.

About Python Worksheets

Python worksheets let you use Snowpark Python in Snowsight to perform data manipulations and transformations. You can use third-party packages listed in the Snowflake Anaconda channel or import your own Python files from stages to use in scripts.

After running a Python worksheet, review the results and output returned by your script. The results display as a string, variant, or a table, depending on your code. See Running Python Worksheets.

Note

Because Python worksheets run inside Snowflake rather than in your local development environment, you cannot use session.add_import to add a file that your Python code depends on, or session.add_packages or session.add_requirements to add packages that you need to use in your Python code. Instead, you add those files to a stage and reference them in your code. See Staging files using Snowsight.

Python worksheets have the following limitations:

  • Log levels lower than WARN do not appear in the Output for a Python worksheet by default. To log lower level messages to the output, use a logging library such as the logging module to set the level of messages logged.

  • No support for breakpoints or running only portions of the Python code in a worksheet.

  • No support for images or webpages. Images or webpages generated by Python code cannot be displayed in Python worksheets.

  • Python worksheets use Python 3.11 by default, but you can choose another supported version in Packages.

If you require support for any of these options, consider using your local development environment instead. See Setting Up Your Development Environment for Snowpark Python.

Prerequisites for Python Worksheets

To use Python worksheets, you must do the following:

Review and accept the Anaconda Terms of Service

Before you start using the packages provided by Anaconda inside Snowflake, you must acknowledge the External Offerings Terms.

Note

You must be the organization administrator (use the ORGADMIN role) to accept the terms. You only need to accept the terms once for your Snowflake account. See Enabling the ORGADMIN Role in an Account.

  1. Sign in to Snowsight.

  2. Select Admin » Billing & Terms.

  3. In the Anaconda section, select Enable.

  4. In the Anaconda Packages dialog, click the link to review the External Offerings Terms page.

  5. If you agree to the terms, select Acknowledge & Continue.

If you see an error when attempting to accept the terms of service, your user profile might be missing a first name, last name, or email address. If you have an administrator role, refer to Add user details to your user profile to update your profile using Snowsight. Otherwise, contact an administrator to update your account.

Add a Python File from a Stage to a Worksheet

Snowflake includes the Snowpark packages from the Snowflake Anaconda channel in Python worksheets.

If you want to use Python files or packages other than those included in Anaconda in your Python worksheet, you must upload the files to a named stage in Snowflake and then add them to the list of installed packages for your Python worksheet.

To use a Python package in your worksheet, do the following:

  1. Sign in to Snowsight.

  2. Open Projects » Worksheets.

  3. Select + » Python Worksheet.

  4. Select a database and schema.

  5. Select Packages » Stage Packages.

  6. Enter the path to the package in the stage:

    • If the selected database and schema for the worksheet contain the stage where the package is located, you can reference the stage using an unqualified name. For example, @YourStage/path/to/example_package.py.

    • To reference a stage in a different database and schema, fully qualify the name of the stage. For example, @Database.Schema.Stage/path/to/other_package.py.

  7. Select Import to add your package to the list of installed packages.

  8. In your code, use import statements to use the package in your Python worksheet. For example, after importing packages from the example_package.py and other_package.py files, write the following code to import a function called function from the example_package, and import the package other_package for use in your code:

    from example_package import function
    import other_package
    
    Copy

Note

Packages that you add to a worksheet are available only to that worksheet. If you want to use the same package in a different Python worksheet, use this procedure to add the package to that worksheet.

For more details, see Making Dependencies Available to Your Code.

Start Developing with Python Worksheets

To open a worksheet and configure your development environment, do the following:

  1. Sign in to Snowsight.

  2. Open Projects » Worksheets.

  3. Select + » Python Worksheet.

  4. Select a database and schema.

  5. Select a warehouse to use to run the worksheet. If you have a default warehouse for your user, it is pre-selected.

    Python worksheets require a running warehouse to load Python packages and run Python code.

  6. (Optional) Select Packages to install Python libraries.

    • The snowflake-snowpark-python package is required and always installed for Python worksheets.

    • Search for packages listed in the Snowflake Anaconda channel, such as numpy, pandas, requests, and urllib3. Select a package to install it for use in your worksheet, and optionally change the default package version in the list of Installed Packages.

    • Add your own packages and Python files by selecting Stage Packages and specifying the file path of the stage and package, then selecting Import. See Add a Python File from a Stage to a Worksheet.

    Packages installed by you appear under Installed Packages.

  7. If you installed Python libraries for your worksheet, add import statements to your code to use the installed libraries.

    For example, if you install the package scikit-learn for your Python worksheet, add an import statement for that package at the beginning of your code

    import scikit-learn
    
    Copy
  8. Run the sample Python code to validate your configuration.

Error messages or the return value from your code appears in the Results section. To view log messages, select Output. See Running Python Worksheets.

Writing Snowpark Code in Python Worksheets

After you follow the steps to start developing with Python worksheets, you can replace the sample code with your own.

Write your Snowpark Python code inside the handler function:

import snowflake.snowpark as snowpark

def main(session: snowpark.Session):
    # your code goes here
Copy

The default handler function is main, but you can change it in the Settings for the worksheet. The active handler is highlighted in the worksheet.

Use the session object provided in the boilerplate code to access data in Snowflake with the Snowpark API libraries. For example, you can create a DataFrame for a table or execute a SQL statement. See the Snowpark Developer Guide for Python.

As you type, you see autocomplete for Python methods, defined variables, and more. You do not see autocomplete for some third-party packages or files imported from a stage. Python worksheets also include syntax highlighting and guidance for method parameters. You can configure linting and line wrapping in the Settings for the worksheet.

Return Results of a Different Data Type

When you write your Python code, consider which type of data is returned by the return statement in your code and adjust how the worksheet returns results. By default, a Python worksheet has a return type of Table() because the placeholder code returns a DataFrame.

Depending on what your Python code returns, you might want to change the worksheet settings to display the output differently:

  • If your handler function returns a DataFrame, use the default return type of Table().

  • If your handler function returns a list of Row objects, such as with the collect method, change the return type to Variant.

  • If your handler function returns a string, such as return "Hello Python", or a value that you want to cast as a string, change the return type to String.

  • If your handler function returns an integer, such as with the count method, use a return type of Variant or String.

For details about the return type of some DataFrame methods, see Performing an Action to Evaluate a DataFrame.

To update the worksheet settings to return results of a different type, do the following:

  1. Sign in to Snowsight.

  2. Open Projects » Worksheets.

  3. Open the Python worksheet for which you want to display the results as a table.

  4. Select a warehouse to use to run the worksheet. If you have a default warehouse for your user, it is pre-selected. Make sure your warehouse is running.

  5. Select Settings and for the Return type, select the type returned by the handler function.

  6. Run your Python worksheet.

  7. Review the results in the Results panel.

Passing Additional Arguments to the Handler Function

With a Python worksheet, you can test a Python function that takes a single argument (a Snowpark Session object) by designating that function as the handler for the worksheet. Every function defined in a Python worksheet needs to pass in the session: snowpark.Session argument.

To test a function that passes in additional arguments, do the following:

  1. Add the arguments to your function.

  2. Define a separate, single-argument function that passes in a Snowpark Session. In this function, call the multi-argument function, passing in values for the additional arguments, then return the value of the function.

    For example, to write Snowpark Python code that filters a table of packages by the package language column, you can write the following code:

    import snowflake.snowpark as snowpark
    from snowflake.snowpark.functions import col
    
    # Add parameters with optional type hints to the main handler function
    def main(session: snowpark.Session, language: str):
      # Your code goes here, inside the "main" handler.
      table_name = 'information_schema.packages'
      dataFrame = session.table(table_name).filter(col("language") == language)
    
      # Print a sample of the dataFrame to standard output
      dataFrame.show()
    
      # The return value appears in the Results tab
      return dataFrame
    
    # Add a second function to supply a value for the language parameter to validate that your main handler function runs.
    def test_language(session: snowpark.Session):
      return main(session, 'java')
    
    Copy

    In this example, the main function is the multi-argument function and the test_language function is the single-argument function used to validate that your code runs with the passed argument values.

  3. Set the single-argument function as the handler function to run the worksheet and validate that your code runs with the argument values.

    In this example, change the handler to the test_language function and then select Run. You can change the handler in the worksheet Settings, or select the Show actions lightbulb next to the handler function and select Set function “test_language” as handler.

When you deploy your Python worksheet as a stored procedure, you can choose the main handler function and review the arguments and the mapped types for your stored procedure.

Running Python Worksheets

After you write your Python worksheet, select Run to run your Python worksheet. Running your worksheet executes all of the code in your Python worksheet. Partial or incremental execution of code is not supported.

Note

If you use a package listed in the Snowflake Anaconda channel and have not yet accepted the Anaconda terms, you might see an error about missing packages. See Using Third-Party Packages from Anaconda.

Review Output Generated by Your Code

You can review standard output (stdout) or standard error (stderr) messages for your Python code in the Output panel for a Python worksheet.

You can see the output from the following types of functions in the Output panel:

  • Functions that write to the console, such as print().

  • Functions that print a DataFrame, such as the show method of the DataFrame class in Snowpark Python.

Note

Output appears after all Python processes finish running, rather than appearing in a stream as the code runs.

Log output is written to a temporary stage and is only captured if the following are true:

  • You select a database and schema for the worksheet.

  • The selected database was not created from a share.

  • You run the worksheet using a role that has USAGE privileges on the selected database and schema.

Review the Query History for a Python Worksheet

When a Python worksheet runs in Snowsight, an anonymous stored procedure runs the code and generates queries that execute the Snowpark commands in the code.

You can use the Query History page in Snowsight to review the queries that ran. See Review Query History by using Snowsight.

For example, after running a worksheet, you can review the queries that ran by doing the following:

  1. Review the Results of the worksheet.

  2. In the Query Details for the worksheet, select More options » Copy Query ID

  3. Select Projects » Worksheets to return to the list of worksheets.

  4. Select Monitoring » Query History.

  5. On the Query History page, display only the queries from your Python worksheet:

    1. Select Filters, and enable the Query ID option.

    2. Enter the Query ID of your Python worksheet.

    3. Select Apply Filters.

  6. Review the queries run for the worksheet.

Example Code for Python Worksheets

When you write Python worksheets, you can perform data transformation and manipulation tasks, including reading data from a named stage.

You can review additional examples in Working with DataFrames in Snowpark Python.

Example: Write a Simple Snowpark Program

In this example, write a Snowpark Python program that generates a small range of numbers and writes the range to a table that your code creates, or overwrites if it already exists, in Snowflake. To run this code example, you must have the CREATE TABLE privilege on the database schema to which you want to add the table.

  1. Sign in to Snowsight.

  2. Open Projects » Worksheets.

  3. Select + » Python Worksheet.

  4. Select a database and schema that you want to add the table to.

  5. Select a warehouse to use to run the worksheet. If you have a default warehouse for your user, it is pre-selected. Make sure your warehouse is running.

  6. Write the Snowpark Python code as part of the main function:

    import snowflake.snowpark as snowpark
    
    def main(session: snowpark.Session):
      tableName = "range_table"
      df_range = session.range(1, 10, 2).to_df('a')
      df_range.write.mode("overwrite").save_as_table(tableName)
      return tableName + " table successfully created"
    
    Copy
  7. Select Settings and for the Return type, select String for the type returned by the handler function.

  8. Run the code.

Example: Transform Data in a Python Worksheet

In this example, write Python code that aggregates the entries in the TASK_HISTORY view in the ACCOUNT_USAGE schema of the SNOWFLAKE database by scheduled time and state and saves the aggregated output to a table, aggregate_task_history.

Note

Because this example queries account usage data, you must use a role with:

  1. Sign in to Snowsight.

  2. Open Projects » Worksheets.

  3. Select + » Python Worksheet.

  4. Select a database and schema that you want to add the table to.

  5. Select a warehouse to use to run the worksheet. If you have a default warehouse for your user, it is pre-selected. Make sure your warehouse is running.

  6. Write the Snowpark Python code as part of the main function:

    import snowflake.snowpark as snowpark
    from snowflake.snowpark.functions import col
    from snowflake.snowpark.dataframe_reader import *
    from snowflake.snowpark.functions import *
    
    def main(session: snowpark.Session):
    
      inputTableName = "snowflake.account_usage.task_history"
      outputTableName = "aggregate_task_history"
    
      df = session.table(inputTableName)
      df.filter(col("STATE") != "SKIPPED")\
        .group_by(("SCHEDULED_TIME"), "STATE").count()\
        .write.mode("overwrite").save_as_table(outputTableName)
      return outputTableName + " table successfully written"
    
    Copy
  7. Select Settings and for the Return type, select String for the type returned by the handler function.

  8. Run the code.

After you run your code in a Python worksheet, you can open a SQL worksheet and query the table. See Querying data using worksheets.

Example: Read Files from a Stage with Python Worksheets

Snowpark Python lets you read files from a stage and write the contents to a table or save them as a view in Snowflake. In this example, the Python code reads the contents of a compressed CSV-formatted file containing employee data, data_0_0_0.csv.gz from the db1.public.files named stage and writes the contents to a table called employees.

Note

To run this code example, you must use a role that has:

  • The USAGE privilege on the stage, database, and schema used in the code.

  • The CREATE TABLE privilege on the database schema to which you want to add the table.

  1. Sign in to Snowsight.

  2. Open Projects » Worksheets.

  3. Select + » Python Worksheet.

  4. Select a database and schema that you want to add the table to.

  5. Select a warehouse to use to run the worksheet. If you have a default warehouse for your user, it is pre-selected. Make sure your warehouse is running.

  6. Write the Snowpark Python code as part of the main function:

    import snowflake.snowpark as snowpark
    from snowflake.snowpark.types import *
    
    schema_for_file = StructType([
      StructField("name", StringType()),
      StructField("role", StringType())
    ])
    
    fileLocation = "@DB1.PUBLIC.FILES/data_0_0_0.csv.gz"
    outputTableName = "employees"
    
    def main(session: snowpark.Session):
      df_reader = session.read.schema(schema_for_file)
      df = df_reader.csv(fileLocation)
      df.write.mode("overwrite").save_as_table(outputTableName)
    
      return outputTableName + " table successfully written from stage"
    
    Copy
  7. Select Settings and for the Return type, select String for the type returned by the handler function.

  8. Run the code.

After you run your code in a Python worksheet, you can open a SQL worksheet and query the table. See Querying data using worksheets.

For more details about working with files in a stage using Snowpark, see Working with DataFrames in Snowpark Python.