Writing Stored Procedures in Python¶
This topic explains how to write a stored procedure in Python. You can use the Snowpark library within your stored procedure to perform queries, updates, and other work on tables in Snowflake.
With Snowpark Stored Procedures, you can build and run your data pipeline within Snowflake, using a Snowflake warehouse as the compute framework. Build your data pipeline by using the Snowpark API for Python to write stored procedures. To schedule the execution of these stored procedures, you use tasks.
For information on machine learning models and Snowpark Python, see Training Machine Learning Models with Snowpark Python.
You can write Snowpark Stored Procedures for Python using a Python worksheet, or using a local development environment.
You can capture log and trace data as your handler code executes. For more information, refer to Logging and Tracing Overview.
To both create and call an anonymous procedure, use CALL (with Anonymous Procedure). Creating and calling an anonymous procedure does not require a role with CREATE PROCEDURE schema privileges.
Prerequisites for Writing Stored Procedures Locally¶
To write Python stored procedures in your local development environment, meet the following prerequisites:
You must use version 0.4.0 or a more recent version of the Snowpark library.
You must use Python version 3.8 to write stored procedures in Python.
Enable Anaconda Packages so that Snowpark Python can load the required third party dependencies. Refer to Using Third-Party Packages from Anaconda.
Be sure to set up your development environment to use the Snowpark library. Refer to Setting Up Your Development Environment for Snowpark.
Writing the Python Code for the Stored Procedure¶
For your procedure’s logic, you write handler code that executes when the procedure is called. This section describes the design of a handler.
You can create a stored procedure from the handler code in several ways:
Include the code in-line with the SQL statement that creates the procedure. Refer to Keeping Handler Code In-line or on a Stage.
Copy the code to a stage and reference it there when you create the procedure. Refer to Keeping Handler Code In-line or on a Stage.
Write the code in a Python worksheet and deploy the worksheet contents to a stored procedure. Refer to Creating a Python Stored Procedure to Automate Your Python Worksheet Code.
Snowpark Stored Procedures have the following limitations:
Creating processes is not supported in stored procedures.
Running concurrent queries is not supported in stored procedures.
You cannot use APIs that execute PUT and GET commands, including
When you download files from stage using
session.file.get, pattern matching is not supported.
If you execute your stored procedure from a task, you must specify a warehouse when creating the task. You cannot use Snowflake-managed compute resources to run the task.
Planning to Write Your Stored Procedure¶
Stored procedures run inside Snowflake, and so you must plan the code that you write with that in mind.
Limit the amount of memory consumed. Snowflake places limits on a method in terms of the amount of memory needed. For guidance, refer to Designing Handlers that Stay Within Snowflake-Imposed Constraints.
Make sure that your handler method or function is thread safe.
Follow the rules and security restrictions. Refer to Security Practices for UDFs and Procedures.
Decide whether you want the stored procedure to run with caller’s rights or owner’s rights.
Consider the snowflake-snowpark-python version used to run stored procedures. Due to limitations in the stored procedures release process, the snowflake-snowpark-python library available in the Python Stored Procedure environment is usually one version behind the publicly released version. Use the following SQL to find out the latest available version:
select * from information_schema.packages where package_name = 'snowflake-snowpark-python' order by version desc;
Writing the Method or Function¶
When writing the method or function for the stored procedure, note the following:
Specify the Snowpark
Sessionobject as the first argument of your method or function. When you call your stored procedure, Snowflake automatically creates a
Sessionobject and passes it to your stored procedure. (You cannot create the
For the rest of the arguments and for the return value, use the Python types that correspond to Snowflake data types. Snowflake supports the Python data types listed in SQL-Python Data Type Mappings for Parameters and Return Types
Python worksheets only support the
Sessionargument type and supports return types of String, Variant, or Table(). Refer to Return Results of a Different Data Type for more details.
Making Dependencies Available to Your Code¶
If your handler code depends on code defined outside the handler itself (such as code defined in a module) or on resource files, you can make those dependencies available to your code by uploading them to a stage. Refer to Making Dependencies Available to Your Code, or for Python worksheets, refer to Add a Python File from a Stage to a Worksheet.
If you create your stored procedure using SQL, use the IMPORTS clause when writing the CREATE PROCEDURE statement, to point to the dependency files.
Accessing Data in Snowflake from Your Stored Procedure¶
To access data in Snowflake, use the Snowpark library APIs.
When handling a call to your Python stored procedure, Snowflake creates a Snowpark
object and passes the object to the method or function for your stored procedure.
As is the case with stored procedures in other languages, the context for the session (e.g. the privileges, current database and schema, etc.) is determined by whether the stored procedure runs with caller’s rights or owner’s rights. For details, see Accessing and Setting the Session State.
You can use this
Session object to call APIs in the
For example, you can create a DataFrame for a table or execute an SQL statement.
See the Snowpark Developer Guide for more information.
Data Access Example¶
The following is an example of a Python method that copies a specified number of rows from one table to another table. The method takes the following arguments:
The name of the table to copy the rows from
The name of the table to save the rows to
The number of rows to copy
The method in this example returns a string. If you run this example in a Python worksheet, change the return type for the worksheet to a String
def run(session, from_table, to_table, count): session.table(from_table).limit(count).write.save_as_table(to_table) return "SUCCESS"
Using Third-Party Packages from Anaconda¶
You can specify Anaconda packages to install when you create Python stored procedures. To view the list of third-party packages from Anaconda, see the Anaconda Snowflake channel. These third-party packages are built and provided by Anaconda. You may use the Snowflake conda channel for local testing and development at no cost under the Supplemental Embedded Software Terms to Anaconda’s Terms of Service.
For limitations, see Limitations.
Before you start using the packages provided by Anaconda inside Snowflake, you must acknowledge the Snowflake Third Party Terms.
You must be the organization administrator (use the ORGADMIN role) to accept the terms. You only need to accept the terms once for your Snowflake account. Refer to Enabling the ORGADMIN Role for an Account.
Sign in to Snowsight.
Select Admin » Billing & Terms.
In the Anaconda section, select Enable.
In the Anaconda Packages dialog, click the link to review the Snowflake Third Party Terms page.
If you agree to the terms, select Acknowledge & Continue.
If you see an error when attempting to accept the terms of service, your user profile might be missing a first name, last name, or email address. If you have an administrator role, refer to Setting User Details and Preferences to update your profile using Snowsight. Otherwise, contact an administrator to update your account.
If you don’t acknowledge the Snowflake Third Party Terms as described above, you can still use stored procedures, but with these limitations:
You can’t use any third-party packages from Anaconda.
You can still specify Snowpark Python as a package in a stored procedure, but you can’t specify a specific version.
You can’t use the
to_pandasmethod when interacting with a DataFrame object.
Displaying and Using Packages¶
You can display all available packages and their version information by querying the PACKAGES view in the Information Schema:
select * from information_schema.packages where language = 'python';
For more information, see Using Third-Party Packages in the Snowflake Python UDF documentation.
Creating the Stored Procedure¶
You can create a stored procedure from a Python worksheet, or using SQL.
To create a stored procedure with SQL, refer to Creating a Stored Procedure.
To create a stored procedure from a Python worksheet, refer to Creating a Python Stored Procedure to Automate Your Python Worksheet Code.
Creating a Python Stored Procedure to Automate Your Python Worksheet Code¶
You can create a Python stored procedure from your Python worksheet to automate your code.
Your role must have OWNERSHIP or CREATE PROCEDURE privileges on the database schema in which you run your Python worksheet to deploy it as a stored procedure.
You can deploy any Python worksheet as a stored procedure. For details on writing Python worksheets, refer to Writing Snowpark Code in Python Worksheets.
Deploy a Python Worksheet as a Stored Procedure¶
To create a Python stored procedure to automate the code in your Python worksheet, do the following:
Sign in to Snowsight.
Open the Python worksheet that you want to deploy as a stored procedure.
Enter a name for the stored procedure.
(Optional) Enter a comment with details about the stored procedure.
(Optional) Select Replace if exists to replace an existing stored procedure with the same name.
(Optional) Select Open in Worksheets to open the stored procedure definition in a SQL worksheet.
Select Deploy to create the stored procedure.
After the stored procedure is created, you can go to the procedure details or select Done.
You can create multiple stored procedures from one Python worksheet.
After you create a stored procedure, you can automate it as part of a task. Refer to Executing SQL Statements on a Schedule Using Tasks.
Returning Tabular Data¶
You can write a procedure that returns data in tabular form. To write a procedure that returns tabular data, do the following:
TABLE(...)as the procedure’s return type in your CREATE PROCEDURE statement.
As TABLE parameters, you can specify the returned data’s column names and types if you know them. If you don’t know the returned columns when defining the procedure – such as when they’re specified at run time – you can leave out the TABLE parameters. When you do, the procedure’s return value columns will be converted from the columns in the DataFrame returned by its handler. Column data types will be converted to SQL according to the mapping specified in SQL-Python Data Type Mappings.
Write the handler so that it returns the tabular result in a Snowpark DataFrame.
For more information about dataframes, see Working with DataFrames in Snowpark Python.
The examples in this section illustrate returning tabular values from a procedure that filters for rows where a column matches a string.
Defining the Data¶
Code in the following example creates a table of employees.
CREATE OR REPLACE TABLE employees(id NUMBER, name VARCHAR, role VARCHAR); INSERT INTO employees (id, name, role) VALUES (1, 'Alice', 'op'), (2, 'Bob', 'dev'), (3, 'Cindy', 'dev');
Specifying Return Column Names and Types¶
This example specifies column names and types in the
RETURNS TABLE() statement.
CREATE OR REPLACE PROCEDURE filterByRole(tableName VARCHAR, role VARCHAR) RETURNS TABLE(id NUMBER, name VARCHAR, role VARCHAR) LANGUAGE PYTHON RUNTIME_VERSION = '3.8' PACKAGES = ('snowflake-snowpark-python') HANDLER = 'filter_by_role' AS $$ from snowflake.snowpark.functions import col def filter_by_role(session, table_name, role): df = session.table(table_name) return df.filter(col("role") == role) $$;
Omitting Return Column Names and Types¶
Code in the following example declares a procedure that allows return value column names and types to be extrapolated from columns in the
handler’s return value. It omits the column names and types from the
RETURNS TABLE() statement.
CREATE OR REPLACE PROCEDURE filterByRole(tableName VARCHAR, role VARCHAR) RETURNS TABLE() LANGUAGE PYTHON RUNTIME_VERSION = '3.8' PACKAGES = ('snowflake-snowpark-python') HANDLER = 'filter_by_role' AS $$ from snowflake.snowpark.functions import col def filter_by_role(session, table_name, role): df = session.table(table_name) return df.filter(col("role") == role) $$;
Calling the Procedure¶
The following example calls the stored procedure:
CALL filterByRole('employees', 'dev');
The procedure call produces the following output:
+----+-------+------+ | ID | NAME | ROLE | +----+-------+------+ | 2 | Bob | dev | | 3 | Cindy | dev | +----+-------+------+
Calling Your Stored Procedure¶
After creating a stored procedure, you can call it from SQL or as part of a scheduled task.
For information on calling a stored procedure from SQL, refer to Calling a Stored Procedure.
For information on calling a stored procedure as part of a scheduled task, refer to Executing SQL Statements on a Schedule Using Tasks.
Running Concurrent Tasks with Worker Processes¶
You can run concurrent tasks using Python worker processes. You might find this useful when you need to run parallel tasks that take advantage of multiple CPU cores on warehouse nodes.
Snowflake recommends that you not use the built-in Python multiprocessing module.
To work around cases where the Python Global Interpreter Lock prevents a multi-tasking approach from scaling across all CPU cores, you can execute concurrent tasks using separate worker processes, rather than threads.
You can do this on Snowflake warehouses by using the
Parallel class, as in the following example.
CREATE OR REPLACE PROCEDURE joblib_multiprocessing_proc(i INT) RETURNS STRING LANGUAGE PYTHON RUNTIME_VERSION = 3.8 HANDLER = 'joblib_multiprocessing' PACKAGES = ('snowflake-snowpark-python', 'joblib') AS $$ import joblib from math import sqrt def joblib_multiprocessing(session, i): result = joblib.Parallel(n_jobs=-1)(joblib.delayed(sqrt)(i ** 2) for i in range(10)) return str(result) $$;
The default backend used for
joblib.Parallel differs between Snowflake standard and Snowpark-optimized warehouses.
Standard warehouse default:
Snowpark-optimized warehouse default:
You can override the default backend setting by calling the
joblib.parallel_backend function, as in the following example.
import joblib joblib.parallel_backend('loky')