# Creating Python UDFs¶

This topic shows how to create and install a Python UDF (user-defined function).

In this Topic:

## Writing the Python Code¶

### Writing the Python Module and Function¶

Write a module that follows the specifications below:

• Define the module. A module is a file containing Python definitions and statements.

• Define a function inside the module.

• If the function accepts arguments, each argument must be one of the data types specified in the Python Data Type column of the SQL-Python Type Mappings table.

Function arguments are bound by position, not name. The first argument passed to the UDF is the first argument received by the Python function.

• Specify an appropriate return value. Because a Python UDF must be a scalar function, it must return one value each time that it is invoked. The type of the return value must be one of the data types specified in the Python Data Type column of the SQL-Python Type Mappings table. The type of the return value must be compatible with the SQL data type specified in the RETURNS clause of the CREATE FUNCTION statement.

• Your module can contain more than one function. The function that is called by Snowflake can call other functions in the same module, or in other modules.

• Your function (and any functions called by your function) must comply with the Snowflake-imposed constraints for Python UDFs.

Note

There is a Python UDF batch API, which enables defining Python functions that receive batches of input rows as Pandas DataFrames and return batches of results as Pandas arrays or Series. For more information, see Python UDF Batch API.

### Reading and Writing Files with a UDF Handler¶

You can read and write files with UDF handler code. To do this securely within the restricted engine in which Snowflake executes UDFs, follow the guidelines described here.

#### Reading Files with a UDF Handler¶

A UDF handler can read files that have been uploaded to a Snowflake stage. The stage hosting the file must be readable by the owner of the UDF.

When you specify a file’s stage location in the IMPORTS clause of CREATE FUNCTION, Snowflake copies the staged file to an import directory available specifically to the UDF. Your handler code can read the file from there.

Note that Snowflake copies all imported files, potentially from multiple stages, to a single import directory. For this reason, the names of files specified in the IMPORTS clause must be distinct relative to each other.

For example code, see Loading a File from a Stage into a Python UDF in this topic.

To read a file with UDF handler code:

1. Copy the file to Snowflake stage.

2. When creating the UDF using CREATE FUNCTION, specify the location of the file in the IMPORTS clause.

Code in the following example specifies a file.txt file at a stage called my_stage.

create or replace function my_udf()
...
imports=('@my_stage/file.txt')
...

3. In your handler code, read the file from the import directory.

Snowflake copies the staged file to the UDF’s import directory. You can retrieve the directory’s location using the snowflake_import_directory system option.

In Python code, you can retrieve the directory’s location using the Python sys._xoptions method, as in the following example:

IMPORT_DIRECTORY_NAME = "snowflake_import_directory"
import_dir = sys._xoptions[IMPORT_DIRECTORY_NAME]

def compute():
with open(import_dir + 'file.txt', 'r') as file:


#### Writing Files with a UDF Handler¶

A UDF handler can write files to a /tmp directory created for the query calling the UDF.

Keep in mind that a /tmp directory is set aside for a single calling query, yet multiple Python worker processes might be running at the same time. To prevent collisions, you must ensure either that access to the /tmp directory is synchronized with other Python worker processes or that the names of files written to /tmp are unique.

For example code, see Unzipping a Staged File in this topic.

Code in the following example writes the input text to the /tmp directory, appending the function’s process ID to ensure the file location’s uniqueness.

def func(text):
# Ensure the file name's uniqueness by appending the function's process ID.
file_path = '/tmp/content' + str(os.getpid())
with open(file_path, "w") as file:
file.write(text)


## Creating the Function in Snowflake¶

You must execute a CREATE FUNCTION statement to specify:

• The SQL function name to use.

• The name of the Python function to call when the Python UDF is called.

The name of the UDF does not need to match the name of the handler function written in Python. The CREATE FUNCTION statement associates the UDF name with the Python function.

When choosing a name for the UDF:

• Follow the rules for Object Identifiers.

Important

Unlike overloading for SQL UDFs, which distinguishes among functions based on both the number and the data types of the arguments, Python UDFs distinguish among functions based only on the number of arguments.

Function arguments are bound by position, not name. The first argument passed to the UDF is the first argument received by the Python function.

For information about the data types of arguments, see SQL-Python Data Type Mappings for Parameters and Return Types.

## UDFs With In-line Code vs. UDFs With Code Uploaded from a Stage¶

The code for a Python UDF can be specified either of the following ways:

• Uploaded from a stage: The CREATE FUNCTION statement specifies the location of an existing Python source code in a stage.

• In-line: The CREATE FUNCTION statement specifies the Python source code.

### Creating an In-line Python UDF¶

For an in-line UDF, you supply the Python source code as part of the CREATE FUNCTION statement.

For example, the following statement creates an in-line Python UDF that adds one to a given integer:

create or replace function addone(i int)
returns int
language python
runtime_version = '3.8'
as
$$def addone_py(i): return i+1$$;


def compute(s):
return s
$$;  Note The import file names specified must be different. For example, this will not work: imports=('@python_udf_dep/bar/python_imports.zip', '@python_udf_dep/foo/python_imports.zip'). ## Granting Privileges on the Function¶ For any role other than the owner of the function to call the function, the owner must grant the appropriate privileges to the role. The GRANT statements for a Python UDF are essentially identical to the GRANT statements for other UDFs, such as JavaScript UDFs. For example: GRANT USAGE ON FUNCTION my_python_udf(number, number) TO my_role;  ## Examples¶ ### Using an Imported Package in an In-line Python UDF¶ A curated list of 3rd party packages from Anaconda is available. For more information, see Using Third-Party Packages. Note Before you can use the packages provided by Anaconda, your Snowflake organization administrator must acknowledge the Snowflake Third Party Terms. For more information, see Getting Started. The following code shows how to import packages and return their versions. Create the UDF: create or replace function py_udf() returns variant language python runtime_version = 3.8 packages = ('numpy','pandas','xgboost==1.5.0') handler = 'udf' as$$
import numpy as np
import pandas as pd
import xgboost as xgb
def udf():
return [np.__version__, pd.__version__, xgb.__version__]
$$;  Call the UDF: select py_udf();  Here is the output: +-------------+ | PY_UDF() | |-------------| | [ | | "1.19.2", | | "1.4.0", | | "1.5.0" | | ] | +-------------+  ### Loading a File from a Stage into a Python UDF¶ This example shows how to import non-code files into a Python UDF from a stage. The file will only be read once during UDF creation, and will not be read again during UDF execution if reading the file happens outside of the target handler. For more about reading a file, see Reading Files with a UDF Handler. Note You can only import files from the top-level directory on a stage, not subfolders. Create the UDF: create or replace function my_udf() returns string language python runtime_version=3.8 imports=('@my_stage/file.txt') handler='compute' as$$
import sys
import os

with open(os.path.join(sys._xoptions["snowflake_import_directory"], 'file.txt'), "r") as f:

def compute():
return s
$$;  ### Unzipping a Staged File¶ You can store a .zip file on a stage, then unzip it in a UDF by using the Python zipfile module. For example, you can upload a .zip file to a stage, then reference the .zip file at its staged location in the IMPORTS clause when you create the UDF. At run time, Snowflake will copy the staged file into an import directory from which your code can access it. For more about reading and writing files, see Reading and Writing Files with a UDF Handler. In the following example, the UDF code uses an NLP model to discover entities in text. The code returns an array of these entities. To set up the NLP model for processing the text, the code first uses the zipfile module to extract the file for the model (en_core_web_sm-2.3.1) from a .zip file. The code then uses the spaCy module to load the model from the file. Note that the code writes extracted file contents to the /tmp directory created for the query calling this function. The code uses file locks to ensure that the extraction is synchronized across Python worker processes; this way, contents are unzipped only once. For more about writing files, see Writing Files with a UDF Handler. For more about the zipfile module, see the zipfile reference. For more about the spaCy module, see the spaCy API documentation. Create the UDF: create or replace function py_spacy(str string) returns array language python runtime_version = 3.8 handler = 'func' packages = ('spacy') imports = ('@spacy_stage/spacy_en_core_web_sm.zip') as$$
import fcntl
import os
import spacy
import sys
import zipfile

class FileLock:
def __enter__(self):
self._lock.acquire()
self._fd = open('/tmp/lockfile.LOCK', 'w+')
fcntl.lockf(self._fd, fcntl.LOCK_EX)

def __exit__(self, type, value, traceback):
self._fd.close()
self._lock.release()

# Get the location of the import directory. Snowflake sets the import
# directory location so code can retrieve the location via sys._xoptions.
IMPORT_DIRECTORY_NAME = "snowflake_import_directory"
import_dir = sys._xoptions[IMPORT_DIRECTORY_NAME]

# Get the path to the ZIP file and set the location to extract to.
zip_file_path = import_dir + "spacy_en_core_web_sm.zip"
extracted = '/tmp/en_core_web_sm'

# Extract the contents of the ZIP. This is done under the file lock
# to ensure that only one worker process unzips the contents.
with FileLock():
if not os.path.isdir(extracted + '/en_core_web_sm/en_core_web_sm-2.3.1'):
with zipfile.ZipFile(zip_file_path, 'r') as myzip:
myzip.extractall(extracted)

# Load the model from the extracted file.

def func(text):
doc = nlp(text)
result = []

for ent in doc.ents:
result.append((ent.text, ent.start_char, ent.end_char, ent.label_))
return result
$$;  ### NULL Handling in Python UDFs¶ The following code shows how NULL values are handled. For more information, see NULL Values. Create the UDF: create or replace function py_udf_null(a variant) returns string language python runtime_version = 3.8 handler = 'udf' as$$

def udf(a):
if not a:
return 'JSON null'
elif getattr(a, "is_sql_null", False):
return 'SQL null'
else:
return 'not null'
;


Call the UDF:

select py_udf_null(null);
select py_udf_null(parse_json('null'));
select py_udf_null(10);


Here is the output:

+-------------------+
| PY_UDF_NULL(NULL) |
|-------------------|
| SQL null          |
+-------------------+

+---------------------------------+
| PY_UDF_NULL(PARSE_JSON('NULL')) |
|---------------------------------|
| JSON null                       |
+---------------------------------+

+-----------------+
| PY_UDF_NULL(10) |
|-----------------|
| not null        |
+-----------------+