Creating Python UDFs

This topic shows how to create and install a Python UDF (user-defined function).

Writing the Python Code

Writing the Python Module and Function

Write a module that follows the specifications below:

  • Define the module. A module is a file containing Python definitions and statements.

  • Define a function inside the module.

  • If the function accepts arguments, each argument must be one of the data types specified in the Python Data Type column of the SQL-Python Type Mappings table.

    Function arguments are bound by position, not name. The first argument passed to the UDF is the first argument received by the Python function.

  • Specify an appropriate return value. Because a Python UDF must be a scalar function, it must return one value each time that it is invoked. The type of the return value must be one of the data types specified in the Python Data Type column of the SQL-Python Type Mappings table. The type of the return value must be compatible with the SQL data type specified in the RETURNS clause of the CREATE FUNCTION statement.

  • Your module can contain more than one function. The function that is called by Snowflake can call other functions in the same module, or in other modules.

  • Your function (and any functions called by your function) must comply with the Snowflake-imposed constraints for Python UDFs.

Note

Vectorized Python UDFs let you define Python functions that receive batches of input rows as Pandas DataFrames and return batches of results as Pandas arrays or Series. For more information, see Vectorized Python UDFs.

Reading and Writing Files with a UDF Handler

You can read and write files with UDF handler code. To do this securely within the restricted engine in which Snowflake executes UDFs, follow the guidelines described here.

Reading Files with a UDF Handler

A UDF handler can read files that have been uploaded to a Snowflake stage. The stage hosting the file must be readable by the owner of the UDF.

When you specify a file’s stage location in the IMPORTS clause of CREATE FUNCTION, Snowflake copies the staged file to an import directory available specifically to the UDF. Your handler code can read the file from there.

Note that Snowflake copies all imported files, potentially from multiple stages, to a single import directory. For this reason, the names of files specified in the IMPORTS clause must be distinct relative to each other.

For example code, see Loading a File from a Stage into a Python UDF in this topic.

To read a file with UDF handler code:

  1. Copy the file to Snowflake stage.

    You can use the PUT command to upload files from a local directory on a client computer; for more information, see PUT. For more general information about loading files to a stage, see Overview of Data Loading.

  2. When creating the UDF using CREATE FUNCTION, specify the location of the file in the IMPORTS clause.

    Code in the following example specifies a file.txt file at a stage called my_stage.

    create or replace function my_udf()
       ...
       imports=('@my_stage/file.txt')
       ...
    
    Copy
  3. In your handler code, read the file from the import directory.

    Snowflake copies the staged file to the UDF’s import directory. You can retrieve the directory’s location using the snowflake_import_directory system option.

    In Python code, you can retrieve the directory’s location using the Python sys._xoptions method, as in the following example:

    IMPORT_DIRECTORY_NAME = "snowflake_import_directory"
    import_dir = sys._xoptions[IMPORT_DIRECTORY_NAME]
    
    def compute():
       with open(import_dir + 'file.txt', 'r') as file:
          return file.read()
    
    Copy

Writing Files with a UDF Handler

A UDF handler can write files to a /tmp directory created for the query calling the UDF.

Keep in mind that a /tmp directory is set aside for a single calling query, yet multiple Python worker processes might be running at the same time. To prevent collisions, you must ensure either that access to the /tmp directory is synchronized with other Python worker processes or that the names of files written to /tmp are unique.

For example code, see Unzipping a Staged File in this topic.

Code in the following example writes the input text to the /tmp directory, appending the function’s process ID to ensure the file location’s uniqueness.

def func(text):
   # Ensure the file name's uniqueness by appending the function's process ID.
   file_path = '/tmp/content' + str(os.getpid())
   with open(file_path, "w") as file:
      file.write(text)
Copy

Creating the Function in Snowflake

You must execute a CREATE FUNCTION statement to specify:

  • The SQL function name to use.

  • The name of the Python function to call when the Python UDF is called.

The name of the UDF does not need to match the name of the handler function written in Python. The CREATE FUNCTION statement associates the UDF name with the Python function.

When choosing a name for the UDF:

  • Follow the rules for Object Identifiers.

  • Choose a name that is unique, or follow the rules for Overloading.

    Important

    Unlike overloading for SQL UDFs, which distinguishes among functions based on both the number and the data types of the arguments, Python UDFs distinguish among functions based only on the number of arguments.

Function arguments are bound by position, not name. The first argument passed to the UDF is the first argument received by the Python function.

For information about the data types of arguments, see SQL-Python Data Type Mappings.

UDFs With In-line Code vs. UDFs With Code Uploaded from a Stage

The code for a Python UDF can be specified either of the following ways:

  • Uploaded from a stage: The CREATE FUNCTION statement specifies the location of an existing Python source code in a stage.

  • In-line: The CREATE FUNCTION statement specifies the Python source code.

Creating an In-line Python UDF

For an in-line UDF, you supply the Python source code as part of the CREATE FUNCTION statement.

For example, the following statement creates an in-line Python UDF that adds one to a given integer:

create or replace function addone(i int)
returns int
language python
runtime_version = '3.8'
handler = 'addone_py'
as
$$
def addone_py(i):
  return i+1
$$;
Copy

The Python source code is specified in the AS clause. The source code can be surrounded by either single quotes or by a pair of dollar signs ($$). Using the double dollar signs is usually easier if the source code contains embedded single quotes.

Call the UDF:

select addone(10);
Copy

Here is the output:

+------------+
| ADDONE(10) |
|------------|
|         11 |
+------------+
Copy

The Python source code can contain more than one module, and more than one function in a module, so the HANDLER clause specifies the module and function to call.

An in-line Python UDF can call code in modules that are included in the IMPORTS clause.

For more details about the syntax of the CREATE FUNCTION statement, see CREATE FUNCTION.

For more examples, see in-line Python UDF examples.

Creating a Python UDF With Code Uploaded from a Stage

The following statements create a simple Python UDF using code uploaded from a stage. The stage hosting the file must be readable by the owner of the UDF. Also, ZIP files must be self-contained and not rely on any additional setup scripts to be executed.

Create a Python file named sleepy.py that contains your source code:

def snore(n):   # return a series of n snores
    result = []
    for a in range(n):
        result.append("Zzz")
    return result
Copy

Launch the SnowSQL (CLI Client) and use the PUT command to copy the file from the local file system to the default user stage, named @~. (The PUT command cannot be executed through the Snowflake GUI.)

put
file:///Users/Me/sleepy.py
@~/
auto_compress = false
overwrite = true
;
Copy

If you delete or rename the file, you can no longer call the UDF. If you need to update your file, then update it while no calls to the UDF can be made. If the old file is still in the stage, the PUT command should include the clause OVERWRITE=TRUE.

Create the UDF. The handler specifies the module and the function.

create or replace function dream(i int)
returns variant
language python
runtime_version = '3.8'
handler = 'sleepy.snore'
imports = ('@~/sleepy.py')
Copy

Call the UDF:

select dream(3);

+----------+
| DREAM(3) |
|----------|
| [        |
|   "Zzz", |
|   "Zzz", |
|   "Zzz"  |
| ]        |
+----------+
Copy

Specifying Multiple Import Files

Here is an example of how to specify multiple import files.

create or replace function multiple_import_files(s string)
returns string
language python
runtime_version=3.8
imports=('@python_udf_dep/bar/python_imports_a.zip', '@python_udf_dep/foo/python_imports_b.zip')
handler='compute'
as
$$
def compute(s):
  return s
$$;
Copy

Note

The import file names specified must be different. For example, this will not work: imports=('@python_udf_dep/bar/python_imports.zip', '@python_udf_dep/foo/python_imports.zip').

Granting Privileges on the Function

For any role other than the owner of the function to call the function, the owner must grant the appropriate privileges to the role.

The GRANT statements for a Python UDF are essentially identical to the GRANT statements for other UDFs, such as JavaScript UDFs.

For example:

GRANT USAGE ON FUNCTION my_python_udf(number, number) TO my_role;
Copy

Examples

Using an Imported Package in an In-line Python UDF

A curated list of 3rd party packages from Anaconda is available. For more information, see Using Third-Party Packages.

Note

Before you can use the packages provided by Anaconda, your Snowflake organization administrator must acknowledge the Snowflake Third Party Terms. For more information, see Getting Started.

The following code shows how to import packages and return their versions.

Create the UDF:

create or replace function py_udf()
returns variant
language python
runtime_version = 3.8
packages = ('numpy','pandas','xgboost==1.5.0')
handler = 'udf'
as $$
import numpy as np
import pandas as pd
import xgboost as xgb
def udf():
    return [np.__version__, pd.__version__, xgb.__version__]
$$;
Copy

Call the UDF:

select py_udf();
Copy

Here is the output:

+-------------+
| PY_UDF()    |
|-------------|
| [           |
|   "1.19.2", |
|   "1.4.0",  |
|   "1.5.0"   |
| ]           |
+-------------+
Copy

Loading a File from a Stage into a Python UDF

This example shows how to import non-code files into a Python UDF from a stage. The file will only be read once during UDF creation, and will not be read again during UDF execution if reading the file happens outside of the target handler. For more about reading a file, see Reading Files with a UDF Handler.

Note

You can only import files from the top-level directory on a stage, not subfolders.

Create the UDF:

create or replace function my_udf()
  returns string
  language python
  runtime_version=3.8
  imports=('@my_stage/file.txt')
  handler='compute'
as
$$
import sys
import os

with open(os.path.join(sys._xoptions["snowflake_import_directory"], 'file.txt'), "r") as f:
    s = f.read()

def compute():
    return s
$$;
Copy

Unzipping a Staged File

You can store a .zip file on a stage, then unzip it in a UDF by using the Python zipfile module.

For example, you can upload a .zip file to a stage, then reference the .zip file at its staged location in the IMPORTS clause when you create the UDF. At run time, Snowflake will copy the staged file into an import directory from which your code can access it.

For more about reading and writing files, see Reading and Writing Files with a UDF Handler.

In the following example, the UDF code uses an NLP model to discover entities in text. The code returns an array of these entities. To set up the NLP model for processing the text, the code first uses the zipfile module to extract the file for the model (en_core_web_sm-2.3.1) from a .zip file. The code then uses the spaCy module to load the model from the file.

Note that the code writes extracted file contents to the /tmp directory created for the query calling this function. The code uses file locks to ensure that the extraction is synchronized across Python worker processes; this way, contents are unzipped only once. For more about writing files, see Writing Files with a UDF Handler.

For more about the zipfile module, see the zipfile reference. For more about the spaCy module, see the spaCy API documentation.

Create the UDF:

create or replace function py_spacy(str string)
   returns array
   language python
   runtime_version = 3.8
   handler = 'func'
   packages = ('spacy')
   imports = ('@spacy_stage/spacy_en_core_web_sm.zip')
   as
$$
import fcntl
import os
import spacy
import sys
import threading
import zipfile

# File lock class for synchronizing write access to /tmp
class FileLock:
   def __enter__(self):
      self._lock = threading.Lock()
      self._lock.acquire()
      self._fd = open('/tmp/lockfile.LOCK', 'w+')
      fcntl.lockf(self._fd, fcntl.LOCK_EX)

   def __exit__(self, type, value, traceback):
      self._fd.close()
      self._lock.release()

# Get the location of the import directory. Snowflake sets the import
# directory location so code can retrieve the location via sys._xoptions.
IMPORT_DIRECTORY_NAME = "snowflake_import_directory"
import_dir = sys._xoptions[IMPORT_DIRECTORY_NAME]

# Get the path to the ZIP file and set the location to extract to.
zip_file_path = import_dir + "spacy_en_core_web_sm.zip"
extracted = '/tmp/en_core_web_sm'

# Extract the contents of the ZIP. This is done under the file lock
# to ensure that only one worker process unzips the contents.
with FileLock():
   if not os.path.isdir(extracted + '/en_core_web_sm/en_core_web_sm-2.3.1'):
      with zipfile.ZipFile(zip_file_path, 'r') as myzip:
         myzip.extractall(extracted)

# Load the model from the extracted file.
nlp = spacy.load(extracted + "/en_core_web_sm/en_core_web_sm-2.3.1")

def func(text):
   doc = nlp(text)
   result = []

   for ent in doc.ents:
      result.append((ent.text, ent.start_char, ent.end_char, ent.label_))
   return result
$$;
Copy

NULL Handling in Python UDFs

The following code shows how NULL values are handled. For more information, see NULL Values.

Create the UDF:

create or replace function py_udf_null(a variant)
returns string
language python
runtime_version = 3.8
handler = 'udf'
as $$

def udf(a):
    if not a:
        return 'JSON null'
    elif getattr(a, "is_sql_null", False):
        return 'SQL null'
    else:
        return 'not null'
$$;
Copy

Call the UDF:

select py_udf_null(null);
select py_udf_null(parse_json('null'));
select py_udf_null(10);
Copy

Here is the output:

+-------------------+
| PY_UDF_NULL(NULL) |
|-------------------|
| SQL null          |
+-------------------+

+---------------------------------+
| PY_UDF_NULL(PARSE_JSON('NULL')) |
|---------------------------------|
| JSON null                       |
+---------------------------------+

+-----------------+
| PY_UDF_NULL(10) |
|-----------------|
| not null        |
+-----------------+
Copy