Writing Stored Procedures in Snowpark (Python)

This topic explains how to write a stored procedure in Python. You can use the Snowpark library within your stored procedure to perform queries, updates, and other work on tables in Snowflake.

In this Topic:

Introduction

With Snowpark Stored Procedures, you can build and run your data pipeline within Snowflake, using a Snowflake warehouse as the compute framework. For your data pipeline’s code, you use the Snowpark API for Python to write stored procedures. To schedule the execution of these stored procedures, you use tasks.

The next sections provide more information about Snowpark Stored Procedures for Python.

Prerequisites

You must use version 0.4.0 or a more recent version of the Snowpark library.

Snowflake currently supports writing Stored Procedures in Python version 3.8.

Note

Python stored procedures require the latest Snowpark Python library, which installs additional third party dependencies. Before trying to create a Python stored procedure, make sure the Anaconda Packages feature is enabled so that the required third party dependencies can be loaded. For more information, see Using Third-Party Packages from Anaconda.

Setting Up Your Development Environment for Snowpark

Be sure to set up your development environment to use the Snowpark library. See Setting Up Your Development Environment for Snowpark for more information.

Choosing to Create a Stored Procedure with In-Line Code or with Code Uploaded from a Stage

As is the case with Python UDFs, you can either create an in-line stored procedure or a stored procedure with code uploaded from a stage.

  • In an in-line stored procedure, you write your Python code in the AS clause of the CREATE PROCEDURE statement. For example:

    CREATE OR REPLACE PROCEDURE MYPROC(from_table STRING, to_table STRING, count INT)
      RETURNS STRING
      LANGUAGE PYTHON
      RUNTIME_VERSION = '3.8'
      PACKAGES = ('snowflake-snowpark-python')
      HANDLER = 'run'
    AS
    $$
    def run(session, from_table, to_table, count):
      session.table(from_table).limit(count).write.save_as_table(to_table)
      return "SUCCESS"
    $$;
    
  • In a stored procedure created with code uploaded from a stage, you write your Python code in a .py source file. For example you could put the following code into a file called my_py_file.py:

    def run(session, from_table, to_table, count):
      session.table(from_table).limit(count).write.save_as_table(to_table)
      return "SUCCESS"
    

    You then upload the file to a stage and execute the CREATE PROCEDURE command, pointing to the file on the stage. For example:

    CREATE OR REPLACE PROCEDURE MYPROC(from_table STRING, to_table STRING, count INT)
      RETURNS INT
      LANGUAGE PYTHON
      RUNTIME_VERSION = '3.8'
      PACKAGES = ('snowflake-snowpark-python')
      IMPORTS = ('@mystage/my_py_file.py')
      HANDLER = 'my_py_file.run';
    

Writing the Python Code for the Stored Procedure

For the code for your stored procedure, you must write a Python method or function.

Planning to Write Your Stored Procedure

As described below, the Python code for your stored procedure has the same constraints as the code for a Python UDF. When planning to write your stored procedures, you should take those constraints into consideration.

Limit the Amount of Memory Consumed

As is the case with Python UDFs, Snowflake places limits on a method or function in terms of the amount of memory needed.

In your method or function, you should avoid consuming too much memory.

Write Thread-Safe Code

Similarly, as is the case with Python UDFs, you should make sure that your method or function is thread safe.

Understand the Security Restrictions

Your method or function runs within a restricted engine, so you should follow the same rules as documented for Python UDFs.

Decide on Using Owner’s Rights or Caller’s Rights

In addition, when planning to write your stored procedure, consider whether you want the stored procedure to run with caller’s rights or owner’s rights.

Be Aware of snowflake-snowpark-python Version

The snowflake-snowpark-python library available in the Python Stored Procedure environment is usually one version behind the publicly released version. This is due to the limitations of the Stored Procedures release process. Use this SQL to find out the latest available snowflake-snowpark-python version.

select * from information_schema.packages where package_name = 'snowflake-snowpark-python' order by version desc;

Writing the Method or Function

When writing the method or function for the stored procedure, note the following:

  • Specify the Snowpark Session object as the first argument of your method or function. When you call your stored procedure, Snowflake automatically creates a Session object and passes it to your stored procedure. (You cannot create the Session object yourself.)

  • For the rest of the arguments and for the return value, use the Python types that correspond to Snowflake data types. Snowflake supports the Python data types listed in SQL-Python Data Type Mappings for Parameters and Return Types

Accessing Data in Snowflake from Your Stored Procedure

To access data in Snowflake, use the Snowpark library APIs.

When handling a call to your Python stored procedure, Snowflake creates a Snowpark Session object and passes the object to the method or function for your stored procedure.

As is the case with stored procedures in other languages, the context for the session (e.g. the privileges, current database and schema, etc.) is determined by whether the stored procedure runs with caller’s rights or owner’s rights. For details, see Session State.

You can use this Session object to call APIs in the Snowpark library. For example, you can create a DataFrame for a table or execute an SQL statement.

See the Snowpark Developer Guide for more information.

The following is an example of a Python method that copies a specified number of rows from one table to another table. The method takes the following arguments:

  • A Snowpark Session object

  • The name of the table to copy the rows from

  • The name of the table to save the rows to

  • The number of rows to copy

The method in this example returns a string.

def run(session, from_table, to_table, count):

session.table(from_table).limit(count).write.save_as_table(to_table)
  return "SUCCESS"

Using Third-Party Packages from Anaconda

You can specify Anaconda packages to install when you create Python stored procedures. To view the list of third-party packages from Anaconda, see the Anaconda Snowflake channel. These third-party packages are built and provided by Anaconda. You may use the Snowflake conda channel for local testing and development at no cost under the Supplemental Embedded Software Terms to Anaconda’s Terms of Service.

For limitations, see Limitations.

Getting Started

Before you start using the packages provided by Anaconda inside Snowflake, you must acknowledge the Snowflake Third Party Terms by following the steps below.

Note

  • The organization administrator (ORGADMIN) role is required to complete the steps in this section. For information about organizations, see Managing Your Snowflake Organization.

  • To accept the terms, a user must have the following user properties set:

    • First name

    • Last name

    • Email address


    If the user properties are not set, the Snowflake UI displays a message telling you to update these properties before proceeding. A user administrator (i.e. user with the USERADMIN role) or a higher role, or another role with the OWNERSHIP privilege on your Snowflake user object, can add these details to your user profile.

  1. Log into Snowsight, the Snowflake web interface.

  2. Click the dropdown menu next to your login name, then click Switch Role » ORGADMIN to change to the organization administrator role.

  3. Click Admin » Billing » Terms & Billing.

  4. Scroll to the Anaconda section and click the Enable button. The Anaconda Packages (Preview Feature) dialog opens.

  5. Click the link to review the Snowflake Third Party Terms.

  6. If you agree to the terms, click the Acknowledge & Continue button.

Note

If you don’t acknowledge the Snowflake Third Party Terms as described above, you can still use stored procedures, but with these limitations:

  • You can’t use any third-party packages from Anaconda.

  • You can still specify Snowpark Python as a package in a stored procedure, but you can’t specify a specific version.

  • You can’t use the to_pandas method when interacting with a DataFrame object.

Displaying and Using Packages

You can also display all packages available and their version information by querying the PACKAGES view in the Information Schema.

select * from information_schema.packages where language = 'python';

For more information, see Using Third-Party Packages in the Snowflake Python UDF documentation.

Accessing Other Classes and Resource Files

If your code depends on other resource files or Python modules defined outside of the stored procedure, you must upload those files to a stage so that the files will be available when the stored procedure executes.

Later, when executing the CREATE PROCEDURE statement, use the IMPORTS clause to point to these files.

Uploading Files to a Stage

Upload the files required for your stored procedure to a stage:

  1. Choose a stage for your files.

    You can use an external or named internal stage. If you plan to use the PUT command to upload the files, use a named internal stage. (The PUT command does not support uploading files to external stages.)

    You can use an existing stage, or you can create a new stage by executing CREATE STAGE. For example, the following command creates a new internal stage named mystage:

    CREATE STAGE mystage;
    

    Note

    The owner of the stored procedure must have the READ privilege to the stage.

  2. Use the PUT command to upload the following files to that stage:

    • The file containing your Python code.

    • Any other files that your stored procedure depends on.

    For example:

    put file:///users/myusername/myfile.py
            @mystage
            AUTO_COMPRESS = FALSE
            OVERWRITE = TRUE
            ;
    

    Note

    If you omit AUTO_COMPRESS = FALSE, the PUT command automatically compresses the file. The name of the compressed file on the stage will be myfile.py.gz. Later, when you execute the CREATE PROCEDURE command, you will need to specify the filename with this .gz extension in the IMPORTS clause.

    Note

    The PUT command does not support uploading files to external stages. To upload files to external stages, use the utilities provided by the cloud service.

Note that if you delete or rename the file containing your Python code, you can no longer call the stored procedure.

If you need to update your file, then:

  • Update it while no calls to the stored procedure can be made.

  • Add the clause OVERWRITE=TRUE to the PUT command.

Creating the Stored Procedure

Execute the CREATE PROCEDURE statement to create a stored procedure for your method or function. Set the parameters listed in the table below.

Parameter

Description

[ arg_name arg_data_type [, ... ] ]

  • Omit the argument for the Snowpark Session object. As mentioned earlier, this is not a formal parameter that you specify in CREATE PROCEDURE or CALL. When you call your stored procedure, Snowflake creates a Session object and passes it to your stored procedure.

  • For the data types of the rest of the arguments, use the Snowflake data types that correspond to the Python types of the arguments in your method or function.

RETURNS result_data_type

Specify RETURNS with the Snowflake data type of your return value.

LANGUAGE PYTHON

You must specify this to indicate that your stored procedure code is written in Python.

RUNTIME_VERSION = '3.8'

You must specify this to indicate that your stored procedure uses Python 3.8.

PACKAGES = ( 'package_name' )

In this list, include the package for the version of the Snowpark library that you want to use.

Specify the fully qualified package name for the Snowpark library in the following format:

snowflake-snowpark-python==<version>

For example:

  • To use the latest version of Snowpark, use:

    PACKAGES = ('snowflake-snowpark-python')
    
  • To use the 0.4.0 version of Snowpark, use:

    PACKAGES = ('snowflake-snowpark-python==0.4.0')
    

IMPORTS = ( 'file' [, 'file' ... ] )

If your stored procedure depends on any files that you uploaded to a stage location, include those files in this list.

If you are writing an in-line stored procedure, you can omit this clause, unless your code depends on classes defined outside the stored procedure or resource files.

If you are writing a pre-compiled stored procedure, you must also include the file containing the definition of the stored procedure.

HANDLER = 'method_name'

Set this to the fully qualified name of your Python method or function.

EXECUTE AS CALLER

If you plan to set up the stored procedure to use caller’s rights, add this parameter.

Otherwise, if you want to use owner’s rights, omit this parameter.

The following examples create stored procedures in Python.

Example 1: In-line stored procedure:

CREATE OR REPLACE PROCEDURE myProc(from_table STRING, to_table STRING, count INT)
RETURNS STRING
LANGUAGE PYTHON
RUNTIME_VERSION = '3.8'
PACKAGES = ('snowflake-snowpark-python')
HANDLER = 'run'
AS
$$
def run(session, from_table, to_table, count):
  session.table(from_table).limit(count).write.save_as_table(to_table)
  return "SUCCESS"
$$;

Example 2: A stored procedure that uses code uploaded from a stage in the Python file myfile.py on the internal stage mystage:

CREATE OR REPLACE PROCEDURE myProc(from_table STRING, to_table STRING, count INT)
RETURNS STRING
LANGUAGE PYTHON
RUNTIME_VERSION = '3.8'
PACKAGES = ('snowflake-snowpark-python')
IMPORTS = ('@mystage/myfile.py')
HANDLER = 'myfile.run'

Calling Your Stored Procedure

In order for a user to call a stored procedure, the user’s role must have the USAGE privilege for the stored procedure.

Once you have the privileges to call the stored procedure, you can use the CALL statement to call the stored procedure. For example:

CALL myProc('table_a', 'table_b', 5);

Limitations

Snowpark Stored Procedures have the following limitations:

  • Concurrency is not supported in stored procedures. For example, from within your stored procedure, you cannot submit queries from multiple threads.

  • If you are executing your stored procedure from a task, you must specify a warehouse when creating the task. (You cannot use Snowflake-managed compute resources to run the task.)

  • You cannot use APIs that execute PUT and GET commands, including Session.sql("PUT ...") and Session.sql("GET ...").

  • Currently, you cannot use a schema name in the stage location when you call session.file.put or session.file.get.

  • When you download files from stage using session.file.get, pattern matching is not supported.

  • Currently, the TensorFlow package cannot be used in stored procedures.

Back to top