Introduction to Python UDFs¶
This topic introduces Python UDFs and provides information to help you decide when to use a Python UDF.
In this Topic:
What is a Python UDF?¶
A UDF (user-defined function) is a user-written function that can be called from Snowflake in the same way that a built-in function can be called.
Snowflake supports UDFs written in multiple languages, including Python.
Python UDFs are scalar functions; for each row passed to the UDF, the UDF returns a value.
UDFs accept 0 or more parameters.
When a user calls a UDF, the user passes the name of the UDF and the parameters of the UDF to Snowflake. If the UDF is a Python UDF, Snowflake calls the appropriate Python code (called a handler function). The handler function then returns the output to Snowflake, which passes it back to the client.
Python UDFs can contain both new code and calls to existing packages, allowing you both flexibility and code reuse. For example, if you already have data analysis code in Python, then you can probably incorporate that into a Python UDF.
Snowflake currently supports writing UDFs in Python version 3.8.
Through a partnership with Anaconda, 3rd party Python packages are provided by Snowflake. To learn more, see Using Third-Party Packages.
Deciding When to Use a Python UDF¶
For information about other potential languages in which to write UDFs, and for comparisons among those languages, see Overview of UDFs.
For information about other ways to extend Snowflake, see:
Advantages of Python UDFs¶
Python UDFs are particularly appropriate when one or more of the following are true:
You already have Python code (source or compiled) that you can use.
Your code uses (or could use) functions that already exist in standard Python packages.
You know Python as well as or better than the other languages that support UDFs.
You want to take advantage of Python’s rich 3rd-party ecosystem.
Limitations on Python UDFs¶
Although your Python function can use modules and functions in the standard Python packages, Snowflake security constraints disable some capabilities, such as network access and writing to files. For details, see the section titled Following Good Security Practices.
All UDFs and modules brought in through stages must be platform-independent and must not contain native extensions.
Avoid code that assumes a specific CPU architecture (e.g. x86).
Avoid code that assumes a specific operating system.
Python UDFs are not sharable. Database objects that use Python UDFs are also not sharable. For example, you cannot:
Directly share a Python UDF.
Share a view that calls a Python UDF.
Share a function that calls a Python UDF.
Share a table with a masking or row access policy that calls a Python UDF.
If you try to create a Python UDF using the SECURE option (
CREATE SECURE FUNCTION...), Snowflake returns an error. The SECURE option is not yet supported for Python UDFs.
Granting USAGE privilege on a Python UDF might allow the recipient to see the contents of files imported by that UDF. If you grant the USAGE privilege on a Python UDF to a role, and if that role executes a statement that calls that Python UDF, then any Python UDF in the same statement could read the contents of any files imported by the Python UDF on which you granted USAGE privilege.
Database replication is supported for in-line Python UDFs. However, replication is blocked if a Python UDF has a dependency on a file in a stage (i.e. a function created using the IMPORTS clause). This limitation might be removed in future versions.
Snowflake uses the Python
zipimportmodule to import Python code from stages. As a result, any
zipimportlimitations will also be present with UDFs. For more about
zipimport, see the zipimport reference.
Limitations on Cloning¶
A Python UDF can be cloned when the database or schema containing the Python UDF is cloned. To be cloned, the Python UDF must meet the following condition(s):
If the Python UDF references a stage, that stage must be outside the schema (or database) being cloned.
You can keep a Python UDF and its referenced stage(s) in separate schemas (and/or separate databases) the following ways:
Wherever the Python UDF references a stage, use a qualified stage name (e.g. “my_db.my_schema.my_stage()”) different from the schema or database of the Python UDF. If the cloning operation clones a database, the stage reference should include the database and schema. If the cloning operation clones a schema, the stage reference should include the schema (and optionally the database).
Create the referenced stage by using a non-qualified stage name (which implicitly uses the current session’s active database and schema), and create the Python UDF by using a qualified name that does not match the session’s current database and schema.
Use the user’s stage as the referenced stage (the user’s stage is separate from any database’s stage or schema’s stage).
If one or more Python UDFs in the schema or database do not meet the required conditions, the schema or database can still be cloned, but the non-compliant Python UDFs are omitted from the clone without any error or warning message.
Each cloned Python UDF has the same definition as the original. That definition includes any references to stages. The stage references in the Python UDF must be fully-qualified, and therefore are absolute, not relative to the schema or database being cloned. Because both the original and the clone point to the same stage(s) and file(s):
Dropping the stage or removing required files from the stage disables both the original and cloned UDF.
Altering the stage or the files on the stage affects both the original and cloned UDF.