Writing Stored Procedures in Snowpark (Python)¶
This topic explains how to write a stored procedure in Python. You can use the Snowpark library within your stored procedure to perform queries, updates, and other work on tables in Snowflake.
In this Topic:
Introduction¶
With Snowpark Stored Procedures, you can build and run your data pipeline within Snowflake, using a Snowflake warehouse as the compute framework. For your data pipeline’s code, you use the Snowpark API for Python to write stored procedures. To schedule the execution of these stored procedures, you use tasks.
The next sections provide more information about Snowpark Stored Procedures for Python.
Prerequisites¶
You must use version 0.4.0 or a more recent version of the Snowpark library.
Snowflake currently supports writing Stored Procedures in Python version 3.8.
Note
Python stored procedures require the latest Snowpark Python library, which installs additional third party dependencies. Before trying to create a Python stored procedure, make sure the Anaconda Packages feature is enabled so that the required third party dependencies can be loaded. For more information, see Using Third-Party Packages from Anaconda.
Setting Up Your Development Environment for Snowpark¶
Be sure to set up your development environment to use the Snowpark library. See Setting Up Your Development Environment for Snowpark for more information.
Choosing to Create a Stored Procedure with In-Line Code or with Code Uploaded from a Stage¶
As is the case with Python UDFs, you can either create an in-line stored procedure or a stored procedure with code uploaded from a stage.
In an in-line stored procedure, you write your Python code in the AS clause of the CREATE PROCEDURE statement. For example:
CREATE OR REPLACE PROCEDURE MYPROC(from_table STRING, to_table STRING, count INT) RETURNS STRING LANGUAGE PYTHON RUNTIME_VERSION = '3.8' PACKAGES = ('snowflake-snowpark-python') HANDLER = 'run' AS $$ def run(session, from_table, to_table, count): session.table(from_table).limit(count).write.save_as_table(to_table) return "SUCCESS" $$;
In a stored procedure created with code uploaded from a stage, you write your Python code in a
.py
source file. For example you could put the following code into a file calledmy_py_file.py
:def run(session, from_table, to_table, count): session.table(from_table).limit(count).write.save_as_table(to_table) return "SUCCESS"
You then upload the file to a stage and execute the CREATE PROCEDURE command, pointing to the file on the stage. For example:
CREATE OR REPLACE PROCEDURE MYPROC(from_table STRING, to_table STRING, count INT) RETURNS INT LANGUAGE PYTHON RUNTIME_VERSION = '3.8' PACKAGES = ('snowflake-snowpark-python') IMPORTS = ('@mystage/my_py_file.py') HANDLER = 'my_py_file.run';
Writing the Python Code for the Stored Procedure¶
For the code for your stored procedure, you must write a Python method or function.
Planning to Write Your Stored Procedure¶
As described below, the Python code for your stored procedure has the same constraints as the code for a Python UDF. When planning to write your stored procedures, you should take those constraints into consideration.
Limit the Amount of Memory Consumed¶
As is the case with Python UDFs, Snowflake places limits on a method or function in terms of the amount of memory needed.
In your method or function, you should avoid consuming too much memory.
Write Thread-Safe Code¶
Similarly, as is the case with Python UDFs, you should make sure that your method or function is thread safe.
Understand the Security Restrictions¶
Your method or function runs within a restricted engine, so you should follow the same rules as documented for Python UDFs.
Decide on Using Owner’s Rights or Caller’s Rights¶
In addition, when planning to write your stored procedure, consider whether you want the stored procedure to run with caller’s rights or owner’s rights.
Be Aware of snowflake-snowpark-python Version¶
The snowflake-snowpark-python library available in the Python Stored Procedure environment is usually one version behind the publicly released version. This is due to the limitations of the Stored Procedures release process. Use this SQL to find out the latest available snowflake-snowpark-python version.
select * from information_schema.packages where package_name = 'snowflake-snowpark-python' order by version desc;
Writing the Method or Function¶
When writing the method or function for the stored procedure, note the following:
Specify the Snowpark
Session
object as the first argument of your method or function. When you call your stored procedure, Snowflake automatically creates aSession
object and passes it to your stored procedure. (You cannot create theSession
object yourself.)For the rest of the arguments and for the return value, use the Python types that correspond to Snowflake data types. Snowflake supports the Python data types listed in SQL-Python Data Type Mappings for Parameters and Return Types
Accessing Data in Snowflake from Your Stored Procedure¶
To access data in Snowflake, use the Snowpark library APIs.
When handling a call to your Python stored procedure, Snowflake creates a Snowpark Session
object and passes the object to the method or function for your stored procedure.
As is the case with stored procedures in other languages, the context for the session (e.g. the privileges, current database and schema, etc.) is determined by whether the stored procedure runs with caller’s rights or owner’s rights. For details, see Session State.
You can use this Session
object to call APIs in the
Snowpark library.
For example, you can create a DataFrame for a table or execute an SQL statement.
See the Snowpark Developer Guide for more information.
The following is an example of a Python method that copies a specified number of rows from one table to another table. The method takes the following arguments:
A Snowpark
Session
objectThe name of the table to copy the rows from
The name of the table to save the rows to
The number of rows to copy
The method in this example returns a string.
def run(session, from_table, to_table, count):
session.table(from_table).limit(count).write.save_as_table(to_table)
return "SUCCESS"
Using Third-Party Packages from Anaconda¶
You can specify Anaconda packages to install when you create Python stored procedures. To view the list of third-party packages from Anaconda, see the Anaconda Snowflake channel. These third-party packages are built and provided by Anaconda. You may use the Snowflake conda channel for local testing and development at no cost under the Supplemental Embedded Software Terms to Anaconda’s Terms of Service.
For limitations, see Limitations.
Getting Started¶
Before you start using the packages provided by Anaconda inside Snowflake, you must acknowledge the Snowflake Third Party Terms by following the steps below.
Note
The organization administrator (ORGADMIN) role is required to complete the steps in this section. For information about organizations, see Managing Your Snowflake Organization.
To accept the terms, a user must have the following user properties set:
First name
Last name
Email address
If the user properties are not set, the Snowflake UI displays a message telling you to update these properties before proceeding. A user administrator (i.e. user with the USERADMIN role) or a higher role, or another role with the OWNERSHIP privilege on your Snowflake user object, can add these details to your user profile.
Log into Snowsight, the Snowflake web interface.
Click the dropdown menu next to your login name, then click Switch Role » ORGADMIN to change to the organization administrator role.
Click Admin » Billing » Terms & Billing.
Scroll to the Anaconda section and click the Enable button. The Anaconda Packages (Preview Feature) dialog opens.
Click the link to review the Snowflake Third Party Terms.
If you agree to the terms, click the Acknowledge & Continue button.
Note
If you don’t acknowledge the Snowflake Third Party Terms as described above, you can still use stored procedures, but with these limitations:
You can’t use any third-party packages from Anaconda.
You can still specify Snowpark Python as a package in a stored procedure, but you can’t specify a specific version.
You can’t use the
to_pandas
method when interacting with a DataFrame object.
Displaying and Using Packages¶
You can also display all packages available and their version information by querying the PACKAGES view in the Information Schema.
select * from information_schema.packages where language = 'python';
For more information, see Using Third-Party Packages in the Snowflake Python UDF documentation.
Accessing Other Classes and Resource Files¶
If your code depends on other resource files or Python modules defined outside of the stored procedure, you must upload those files to a stage so that the files will be available when the stored procedure executes.
Later, when executing the CREATE PROCEDURE statement, use the IMPORTS clause to point to these files.
Uploading Files to a Stage¶
Upload the files required for your stored procedure to a stage:
Choose a stage for your files.
You can use an external or named internal stage. If you plan to use the PUT command to upload the files, use a named internal stage. (The PUT command does not support uploading files to external stages.)
You can use an existing stage, or you can create a new stage by executing CREATE STAGE. For example, the following command creates a new internal stage named
mystage
:CREATE STAGE mystage;
Note
The owner of the stored procedure must have the READ privilege to the stage.
Use the PUT command to upload the following files to that stage:
The file containing your Python code.
Any other files that your stored procedure depends on.
For example:
put file:///users/myusername/myfile.py @mystage AUTO_COMPRESS = FALSE OVERWRITE = TRUE ;
Note
If you omit
AUTO_COMPRESS = FALSE
, the PUT command automatically compresses the file. The name of the compressed file on the stage will bemyfile.py.gz
. Later, when you execute the CREATE PROCEDURE command, you will need to specify the filename with this.gz
extension in the IMPORTS clause.Note
The PUT command does not support uploading files to external stages. To upload files to external stages, use the utilities provided by the cloud service.
Note that if you delete or rename the file containing your Python code, you can no longer call the stored procedure.
If you need to update your file, then:
Update it while no calls to the stored procedure can be made.
Add the clause OVERWRITE=TRUE to the PUT command.
Creating the Stored Procedure¶
Execute the CREATE PROCEDURE statement to create a stored procedure for your method or function. Set the parameters listed in the table below.
Parameter |
Description |
---|---|
|
|
|
Specify RETURNS with the Snowflake data type of your return value. |
|
You must specify this to indicate that your stored procedure code is written in Python. |
|
You must specify this to indicate that your stored procedure uses Python 3.8. |
|
In this list, include the package for the version of the Snowpark library that you want to use. Specify the fully qualified package name for the Snowpark library in the following format:
For example:
|
|
If your stored procedure depends on any files that you uploaded to a stage location, include those files in this list. If you are writing an in-line stored procedure, you can omit this clause, unless your code depends on classes defined outside the stored procedure or resource files. If you are writing a pre-compiled stored procedure, you must also include the file containing the definition of the stored procedure. |
|
Set this to the fully qualified name of your Python method or function. |
|
If you plan to set up the stored procedure to use caller’s rights, add this parameter. Otherwise, if you want to use owner’s rights, omit this parameter. |
The following examples create stored procedures in Python.
Example 1: In-line stored procedure:
CREATE OR REPLACE PROCEDURE myProc(from_table STRING, to_table STRING, count INT)
RETURNS STRING
LANGUAGE PYTHON
RUNTIME_VERSION = '3.8'
PACKAGES = ('snowflake-snowpark-python')
HANDLER = 'run'
AS
$$
def run(session, from_table, to_table, count):
session.table(from_table).limit(count).write.save_as_table(to_table)
return "SUCCESS"
$$;
Example 2: A stored procedure that uses code uploaded from a stage in the Python file myfile.py
on the internal stage mystage
:
CREATE OR REPLACE PROCEDURE myProc(from_table STRING, to_table STRING, count INT)
RETURNS STRING
LANGUAGE PYTHON
RUNTIME_VERSION = '3.8'
PACKAGES = ('snowflake-snowpark-python')
IMPORTS = ('@mystage/myfile.py')
HANDLER = 'myfile.run'
Calling Your Stored Procedure¶
In order for a user to call a stored procedure, the user’s role must have the USAGE privilege for the stored procedure.
Once you have the privileges to call the stored procedure, you can use the CALL statement to call the stored procedure. For example:
CALL myProc('table_a', 'table_b', 5);
Limitations¶
Snowpark Stored Procedures have the following limitations:
Concurrency is not supported in stored procedures. For example, from within your stored procedure, you cannot submit queries from multiple threads.
If you are executing your stored procedure from a task, you must specify a warehouse when creating the task. (You cannot use Snowflake-managed compute resources to run the task.)
You cannot use APIs that execute PUT and GET commands, including
Session.sql("PUT ...")
andSession.sql("GET ...")
.Currently, you cannot use a schema name in the stage location when you call
session.file.put
orsession.file.get
.When you download files from stage using
session.file.get
, pattern matching is not supported.Currently, the TensorFlow package cannot be used in stored procedures.