Writing Stored Procedures in Python

This topic explains how to write a stored procedure in Python. You can use the Snowpark library within your stored procedure to perform queries, updates, and other work on tables in Snowflake.

Introduction

With Snowpark Stored Procedures, you can build and run your data pipeline within Snowflake, using a Snowflake warehouse as the compute framework. For your data pipeline’s code, you use the Snowpark API for Python to write stored procedures. To schedule the execution of these stored procedures, you use tasks.

For information on machine learning models and Snowpark Python, see Training Machine Learning Models with Snowpark Python.

The next sections provide more information about Snowpark Stored Procedures for Python.

Note

To both create and call an anonymous procedure, use CALL (with Anonymous Procedure). Creating and calling an anonymous procedure does not require a role with CREATE PROCEDURE schema privileges.

Prerequisites

You must use version 0.4.0 or a more recent version of the Snowpark library.

Snowflake currently supports writing Stored Procedures in Python version 3.8.

Note

Python stored procedures require the latest Snowpark Python library, which installs additional third party dependencies. Before trying to create a Python stored procedure, make sure the Anaconda Packages feature is enabled so that the required third party dependencies can be loaded. For more information, see Using Third-Party Packages from Anaconda.

Setting Up Your Development Environment for Snowpark

Be sure to set up your development environment to use the Snowpark library. See Setting Up Your Development Environment for Snowpark for more information.

Structuring and Building Handler Code

You can keep handler source code in-line with the SQL that creates the procedure or keep handler compiled result in a separate location and reference it from the SQL. For more information, see Keeping Handler Code In-line or on a Stage.

Limitations

Snowpark Stored Procedures have the following limitations:

  • Creating processes is not supported in stored procedures.

  • Running concurrent queries is not supported in stored procedures.

  • If you are executing your stored procedure from a task, you must specify a warehouse when creating the task. (You cannot use Snowflake-managed compute resources to run the task.)

  • You cannot use APIs that execute PUT and GET commands, including Session.sql("PUT ...") and Session.sql("GET ...").

  • When you download files from stage using session.file.get, pattern matching is not supported.

Writing the Python Code for the Stored Procedure

For your procedure’s logic, you write handler code that executes when the procedure is called. This section describes the design of a handler.

You can include this code in-line with the SQL statement that creates the procedure, or copy the code to a stage and reference it there when you create the procedure. For more information, see Keeping Handler Code In-line or on a Stage.

Planning to Write Your Stored Procedure

Limit the Amount of Memory Consumed

Snowflake places limits on a method in terms of the amount of memory needed. For more information on how to avoid consuming too much, see Designing Handlers that Stay Within Snowflake-Imposed Constraints.

Write Thread-Safe Code

Make sure that your handler method or function is thread safe.

Understand the Security Restrictions

Your handler code runs within a restricted engine, so be sure to follow the rules described in Security Practices for UDFs and Procedures.

Decide on Using Owner’s Rights or Caller’s Rights

In addition, when planning to write your stored procedure, consider whether you want the stored procedure to run with caller’s rights or owner’s rights.

Be Aware of snowflake-snowpark-python Version

The snowflake-snowpark-python library available in the Python Stored Procedure environment is usually one version behind the publicly released version. This is due to the limitations of the Stored Procedures release process. Use this SQL to find out the latest available snowflake-snowpark-python version.

select * from information_schema.packages where package_name = 'snowflake-snowpark-python' order by version desc;

Writing the Method or Function

When writing the method or function for the stored procedure, note the following:

  • Specify the Snowpark Session object as the first argument of your method or function. When you call your stored procedure, Snowflake automatically creates a Session object and passes it to your stored procedure. (You cannot create the Session object yourself.)

  • For the rest of the arguments and for the return value, use the Python types that correspond to Snowflake data types. Snowflake supports the Python data types listed in SQL-Python Data Type Mappings for Parameters and Return Types

Making Dependencies Available to Your Code

If your handler code depends on code defined outside the handler itself (such as code defined in a module) or on resource files, you can make those dependencies available to your code by uploading them to a stage. When creating the procedure, you can reference these dependencies using the IMPORTS clause.

For more information, see Making Dependencies Available to Your Code.

Accessing Data in Snowflake from Your Stored Procedure

To access data in Snowflake, use the Snowpark library APIs.

When handling a call to your Python stored procedure, Snowflake creates a Snowpark Session object and passes the object to the method or function for your stored procedure.

As is the case with stored procedures in other languages, the context for the session (e.g. the privileges, current database and schema, etc.) is determined by whether the stored procedure runs with caller’s rights or owner’s rights. For details, see Accessing and Setting the Session State.

You can use this Session object to call APIs in the Snowpark library. For example, you can create a DataFrame for a table or execute an SQL statement.

See the Snowpark Developer Guide for more information.

Data Access Example

The following is an example of a Python method that copies a specified number of rows from one table to another table. The method takes the following arguments:

  • A Snowpark Session object

  • The name of the table to copy the rows from

  • The name of the table to save the rows to

  • The number of rows to copy

The method in this example returns a string.

def run(session, from_table, to_table, count):

session.table(from_table).limit(count).write.save_as_table(to_table)
  return "SUCCESS"

Using Third-Party Packages from Anaconda

You can specify Anaconda packages to install when you create Python stored procedures. To view the list of third-party packages from Anaconda, see the Anaconda Snowflake channel. These third-party packages are built and provided by Anaconda. You may use the Snowflake conda channel for local testing and development at no cost under the Supplemental Embedded Software Terms to Anaconda’s Terms of Service.

For limitations, see Limitations.

Getting Started

Before you start using the packages provided by Anaconda inside Snowflake, you must acknowledge the Snowflake Third Party Terms by following the steps below.

Note

  • The organization administrator (ORGADMIN) role is required to complete the steps in this section. For information about organizations, see Managing Your Snowflake Organization.

  • To accept the terms, a user must have the following user properties set:

    • First name

    • Last name

    • Email address


    If the user properties are not set, the Snowflake UI displays a message telling you to update these properties before proceeding. A user administrator (i.e. user with the USERADMIN role) or a higher role, or another role with the OWNERSHIP privilege on your Snowflake user object, can add these details to your user profile.

  1. Log into Snowsight, the Snowflake web interface.

  2. Click the dropdown menu next to your login name, then click Switch Role » ORGADMIN to change to the organization administrator role.

  3. Click Admin » Billing » Terms & Billing.

  4. Scroll to the Anaconda section and click the Enable button. The Anaconda Packages (Preview Feature) dialog opens.

  5. Click the link to review the Snowflake Third Party Terms.

  6. If you agree to the terms, click the Acknowledge & Continue button.

Note

If you don’t acknowledge the Snowflake Third Party Terms as described above, you can still use stored procedures, but with these limitations:

  • You can’t use any third-party packages from Anaconda.

  • You can still specify Snowpark Python as a package in a stored procedure, but you can’t specify a specific version.

  • You can’t use the to_pandas method when interacting with a DataFrame object.

Displaying and Using Packages

You can also display all packages available and their version information by querying the PACKAGES view in the Information Schema.

select * from information_schema.packages where language = 'python';

For more information, see Using Third-Party Packages in the Snowflake Python UDF documentation.

Accessing Other Classes and Resource Files

If your code depends on other resource files or Python modules defined outside of the stored procedure, you can make those dependencies available to the handler by uploading them to a stage. For more information, see Making Dependencies Available to Your Code.

Later, when CREATE PROCEDURE statement, use the IMPORTS clause to point to these files.

Uploading Files to a Stage

To make your procedure’s logic (and other dependencies, if any) available to the procedure you’ll need to upload the files required to a stage. For more information, see Making Dependencies Available to Your Code.

Creating the Stored Procedure

For information about creating a stored procedure with SQL, see Creating a Stored Procedure.

Calling Your Stored Procedure

For information on calling a stored procedure from SQL, see Calling a Stored Procedure.