snowflake.snowpark.functions.udf¶
- snowflake.snowpark.functions.udf(func: Callable | None = None, *, return_type: DataType | None = None, input_types: List[DataType] | None = None, name: str | Iterable[str] | None = None, is_permanent: bool = False, stage_location: str | None = None, imports: List[str | Tuple[str, str]] | None = None, packages: List[str | module] | None = None, replace: bool = False, if_not_exists: bool = False, session: Session | None = None, parallel: int = 4, max_batch_size: int | None = None, statement_params: Dict[str, str] | None = None, source_code_display: bool = True, strict: bool = False, secure: bool = False) UserDefinedFunction | partial [source]¶
Registers a Python function as a Snowflake Python UDF and returns the UDF.
It can be used as either a function call or a decorator. In most cases you work with a single session. This function uses that session to register the UDF. If you have multiple sessions, you need to explicitly specify the
session
parameter of this function. If you have a function and would like to register it to multiple databases, usesession.udf.register
instead. See examples inUDFRegistration
.- Parameters:
func – A Python function used for creating the UDF.
return_type – A
DataType
representing the return data type of the UDF. Optional if type hints are provided.input_types – A list of
DataType
representing the input data types of the UDF. Optional if type hints are provided.name – A string or list of strings that specify the name or fully-qualified object identifier (database name, schema name, and function name) for the UDF in Snowflake, which allows you to call this UDF in a SQL command or via
call_udf()
. If it is not provided, a name will be automatically generated for the UDF. A name must be specified whenis_permanent
isTrue
.is_permanent – Whether to create a permanent UDF. The default is
False
. If it isTrue
, a validstage_location
must be provided.stage_location – The stage location where the Python file for the UDF and its dependencies should be uploaded. The stage location must be specified when
is_permanent
isTrue
, and it will be ignored whenis_permanent
isFalse
. It can be any stage other than temporary stages and external stages.imports – A list of imports that only apply to this UDF. You can use a string to represent a file path (similar to the
path
argument inadd_import()
) in this list, or a tuple of two strings to represent a file path and an import path (similar to theimport_path
argument inadd_import()
). These UDF-level imports will override the session-level imports added byadd_import()
. Note that an empty list means no import for this UDF, andNone
or not specifying this parameter means using session-level imports.packages – A list of packages that only apply to this UDF. These UDF-level packages will override the session-level packages added by
add_packages()
andadd_requirements()
. Note that an empty list means no package for this UDF, andNone
or not specifying this parameter means using session-level packages.replace – Whether to replace a UDF that already was registered. The default is
False
. If it isFalse
, attempting to register a UDF with a name that already exists results in aSnowparkSQLException
exception being thrown. If it isTrue
, an existing UDF with the same name is overwritten.if_not_exists – Whether to skip creation of a UDF when one with the same signature already exists. The default is
False
.if_not_exists
andreplace
are mutually exclusive and aValueError
is raised when both are set. If it isTrue
and a UDF with the same signature exists, the UDF creation is skipped.session – Use this session to register the UDF. If it’s not specified, the session that you created before calling this function will be used. You need to specify this parameter if you have created multiple sessions before calling this method.
parallel – The number of threads to use for uploading UDF files with the PUT command. The default value is 4 and supported values are from 1 to 99. Increasing the number of threads can improve performance when uploading large UDF files.
max_batch_size – The maximum number of rows per input Pandas DataFrame or Pandas Series inside a vectorized UDF. Because a vectorized UDF will be executed within a time limit, which is 60 seconds, this optional argument can be used to reduce the running time of every batch by setting a smaller batch size. Note that setting a larger value does not guarantee that Snowflake will encode batches with the specified number of rows. It will be ignored when registering a non-vectorized UDF.
statement_params – Dictionary of statement level parameters to be set while executing this action.
source_code_display – Display the source code of the UDF func as comments in the generated script. The source code is dynamically generated therefore it may not be identical to how the func is originally defined. The default is
True
. If it isFalse
, source code will not be generated or displayed.strict – Whether the created UDF is strict. A strict UDF will not invoke the UDF if any input is null. Instead, a null value will always be returned for that row. Note that the UDF might still return null for non-null inputs.
secure – Whether the created UDF is secure. For more information about secure functions, see Secure UDFs.
- Returns:
A UDF function that can be called with
Column
expressions.
Note
1. When type hints are provided and are complete for a function,
return_type
andinput_types
are optional and will be ignored. See details of supported data types for UDFs inUDFRegistration
.You can use use
Variant
to annotate a variant, and useGeography
to annotate a geography when defining a UDF.You can use use
PandasSeries
to annotate a Pandas Series, and usePandasDataFrame
to annotate a Pandas DataFrame when defining a vectorized UDF. Note that they are generic types so you can specify the element type in a Pandas Series and DataFrame.typing.Union
is not a valid type annotation for UDFs, buttyping.Optional
can be used to indicate the optional type.Type hints are not supported on functions decorated with decorators.
2. A temporary UDF (when
is_permanent
isFalse
) is scoped to thissession
and all UDF related files will be uploaded to a temporary session stage (session.get_session_stage()
). For a permanent UDF, these files will be uploaded to the stage that you provide.3. By default, UDF registration fails if a function with the same name is already registered. Invoking
udf()
withreplace
set toTrue
will overwrite the previously registered function.4. When registering a vectorized UDF,
pandas
library will be added as a package automatically, with the latest version on the Snowflake server. If you don’t want to use this version, you can overwrite it by adding pandas with specific version requirement usingpackage
argument oradd_packages()
.See also
UDFs can be created as anonymous UDFs
Example:
>>> from snowflake.snowpark.types import IntegerType >>> add_one = udf(lambda x: x+1, return_type=IntegerType(), input_types=[IntegerType()]) >>> df = session.create_dataframe([1, 2, 3], schema=["a"]) >>> df.select(add_one(col("a")).as_("ans")).collect() [Row(ANS=2), Row(ANS=3), Row(ANS=4)]
or as named UDFs that are accessible in the same session. Instead of calling udf as function, it can be also used as a decorator:
Example:
>>> @udf(name="minus_one", replace=True) ... def minus_one(x: int) -> int: ... return x - 1 >>> df.select(minus_one(col("a")).as_("ans")).collect() [Row(ANS=0), Row(ANS=1), Row(ANS=2)] >>> session.sql("SELECT minus_one(10)").collect() [Row(MINUS_ONE(10)=9)]