Writing Stored Procedures in Python¶
This topic explains how to write a stored procedure in Python. You can use the Snowpark library within your stored procedure to perform queries, updates, and other work on tables in Snowflake.
Introduction¶
With Snowpark Stored Procedures, you can build and run your data pipeline within Snowflake, using a Snowflake warehouse as the compute framework. For your data pipeline’s code, you use the Snowpark API for Python to write stored procedures. To schedule the execution of these stored procedures, you use tasks.
For information on machine learning models and Snowpark Python, see Training Machine Learning Models with Snowpark Python.
The next sections provide more information about Snowpark Stored Procedures for Python.
Note
To both create and call an anonymous procedure, use CALL (with Anonymous Procedure). Creating and calling an anonymous procedure does not require a role with CREATE PROCEDURE schema privileges.
Prerequisites¶
You must use version 0.4.0 or a more recent version of the Snowpark library.
Snowflake currently supports writing Stored Procedures in Python version 3.8.
Note
Python stored procedures require the latest Snowpark Python library, which installs additional third party dependencies. Before trying to create a Python stored procedure, make sure the Anaconda Packages feature is enabled so that the required third party dependencies can be loaded. For more information, see Using Third-Party Packages from Anaconda.
Setting Up Your Development Environment for Snowpark¶
Be sure to set up your development environment to use the Snowpark library. See Setting Up Your Development Environment for Snowpark for more information.
Structuring and Building Handler Code¶
You can keep handler source code in-line with the SQL that creates the procedure or keep handler compiled result in a separate location and reference it from the SQL. For more information, see Keeping Handler Code In-line or on a Stage.
Limitations¶
Snowpark Stored Procedures have the following limitations:
Creating processes is not supported in stored procedures.
Running concurrent queries is not supported in stored procedures.
If you are executing your stored procedure from a task, you must specify a warehouse when creating the task. (You cannot use Snowflake-managed compute resources to run the task.)
You cannot use APIs that execute PUT and GET commands, including
Session.sql("PUT ...")
andSession.sql("GET ...")
.When you download files from stage using
session.file.get
, pattern matching is not supported.
Writing the Python Code for the Stored Procedure¶
For your procedure’s logic, you write handler code that executes when the procedure is called. This section describes the design of a handler.
You can include this code in-line with the SQL statement that creates the procedure, or copy the code to a stage and reference it there when you create the procedure. For more information, see Keeping Handler Code In-line or on a Stage.
Planning to Write Your Stored Procedure¶
Limit the Amount of Memory Consumed¶
Snowflake places limits on a method in terms of the amount of memory needed. For more information on how to avoid consuming too much, see Designing Handlers that Stay Within Snowflake-Imposed Constraints.
Write Thread-Safe Code¶
Make sure that your handler method or function is thread safe.
Understand the Security Restrictions¶
Your handler code runs within a restricted engine, so be sure to follow the rules described in Security Practices for UDFs and Procedures.
Decide on Using Owner’s Rights or Caller’s Rights¶
In addition, when planning to write your stored procedure, consider whether you want the stored procedure to run with caller’s rights or owner’s rights.
Be Aware of snowflake-snowpark-python Version¶
The snowflake-snowpark-python library available in the Python Stored Procedure environment is usually one version behind the publicly released version. This is due to the limitations of the Stored Procedures release process. Use this SQL to find out the latest available snowflake-snowpark-python version.
select * from information_schema.packages where package_name = 'snowflake-snowpark-python' order by version desc;
Writing the Method or Function¶
When writing the method or function for the stored procedure, note the following:
Specify the Snowpark
Session
object as the first argument of your method or function. When you call your stored procedure, Snowflake automatically creates aSession
object and passes it to your stored procedure. (You cannot create theSession
object yourself.)For the rest of the arguments and for the return value, use the Python types that correspond to Snowflake data types. Snowflake supports the Python data types listed in SQL-Python Data Type Mappings for Parameters and Return Types
Making Dependencies Available to Your Code¶
If your handler code depends on code defined outside the handler itself (such as code defined in a module) or on resource files, you can make those dependencies available to your code by uploading them to a stage. When creating the procedure, you can reference these dependencies using the IMPORTS clause.
For more information, see Making Dependencies Available to Your Code.
Accessing Data in Snowflake from Your Stored Procedure¶
To access data in Snowflake, use the Snowpark library APIs.
When handling a call to your Python stored procedure, Snowflake creates a Snowpark Session
object and passes the object to the method or function for your stored procedure.
As is the case with stored procedures in other languages, the context for the session (e.g. the privileges, current database and schema, etc.) is determined by whether the stored procedure runs with caller’s rights or owner’s rights. For details, see Accessing and Setting the Session State.
You can use this Session
object to call APIs in the
Snowpark library.
For example, you can create a DataFrame for a table or execute an SQL statement.
See the Snowpark Developer Guide for more information.
Data Access Example¶
The following is an example of a Python method that copies a specified number of rows from one table to another table. The method takes the following arguments:
A Snowpark
Session
objectThe name of the table to copy the rows from
The name of the table to save the rows to
The number of rows to copy
The method in this example returns a string.
def run(session, from_table, to_table, count):
session.table(from_table).limit(count).write.save_as_table(to_table)
return "SUCCESS"
Using Third-Party Packages from Anaconda¶
You can specify Anaconda packages to install when you create Python stored procedures. To view the list of third-party packages from Anaconda, see the Anaconda Snowflake channel. These third-party packages are built and provided by Anaconda. You may use the Snowflake conda channel for local testing and development at no cost under the Supplemental Embedded Software Terms to Anaconda’s Terms of Service.
For limitations, see Limitations.
Getting Started¶
Before you start using the packages provided by Anaconda inside Snowflake, you must acknowledge the Snowflake Third Party Terms by following the steps below.
Note
The organization administrator (ORGADMIN) role is required to complete the steps in this section. For information about organizations, see Managing Your Snowflake Organization.
To accept the terms, a user must have the following user properties set:
First name
Last name
Email address
If the user properties are not set, the Snowflake UI displays a message telling you to update these properties before proceeding. A user administrator (i.e. user with the USERADMIN role) or a higher role, or another role with the OWNERSHIP privilege on your Snowflake user object, can add these details to your user profile.
Log into Snowsight, the Snowflake web interface.
Click the dropdown menu next to your login name, then click Switch Role » ORGADMIN to change to the organization administrator role.
Click Admin » Billing » Terms & Billing.
Scroll to the Anaconda section and click the Enable button. The Anaconda Packages (Preview Feature) dialog opens.
Click the link to review the Snowflake Third Party Terms.
If you agree to the terms, click the Acknowledge & Continue button.
Note
If you don’t acknowledge the Snowflake Third Party Terms as described above, you can still use stored procedures, but with these limitations:
You can’t use any third-party packages from Anaconda.
You can still specify Snowpark Python as a package in a stored procedure, but you can’t specify a specific version.
You can’t use the
to_pandas
method when interacting with a DataFrame object.
Displaying and Using Packages¶
You can also display all packages available and their version information by querying the PACKAGES view in the Information Schema.
select * from information_schema.packages where language = 'python';
For more information, see Using Third-Party Packages in the Snowflake Python UDF documentation.
Accessing Other Classes and Resource Files¶
If your code depends on other resource files or Python modules defined outside of the stored procedure, you can make those dependencies available to the handler by uploading them to a stage. For more information, see Making Dependencies Available to Your Code.
Later, when CREATE PROCEDURE statement, use the IMPORTS clause to point to these files.
Uploading Files to a Stage¶
To make your procedure’s logic (and other dependencies, if any) available to the procedure you’ll need to upload the files required to a stage. For more information, see Making Dependencies Available to Your Code.
Creating the Stored Procedure¶
For information about creating a stored procedure with SQL, see Creating a Stored Procedure.
Calling Your Stored Procedure¶
For information on calling a stored procedure from SQL, see Calling a Stored Procedure.