snowflake.snowpark.functions.udtf¶
- snowflake.snowpark.functions.udtf(handler: Optional[Callable] = None, *, output_schema: Union[StructType, List[str]], input_types: Optional[List[DataType]] = None, name: Optional[Union[str, Iterable[str]]] = None, is_permanent: bool = False, stage_location: Optional[str] = None, imports: Optional[List[Union[str, Tuple[str, str]]]] = None, packages: Optional[List[Union[str, module]]] = None, replace: bool = False, if_not_exists: bool = False, session: Optional[Session] = None, parallel: int = 4, statement_params: Optional[Dict[str, str]] = None, strict: bool = False, secure: bool = False) Union[UserDefinedTableFunction, partial][source]¶
Registers a Python class as a Snowflake Python UDTF and returns the UDTF.
It can be used as either a function call or a decorator. In most cases you work with a single session. This function uses that session to register the UDTF. If you have multiple sessions, you need to explicitly specify the
sessionparameter of this function. If you have a function and would like to register it to multiple databases, usesession.udtf.registerinstead. See examples inUDTFRegistration.- Parameters:
handler – A Python class used for creating the UDTF.
output_schema – A list of column names, or a
StructTypeinstance that represents the table function’s columns. If a list of column names is provided, theprocessmethod of the handler class must have return type hints to indicate the output schema data types.input_types – A list of
DataTyperepresenting the input data types of the UDTF. Optional if type hints are provided.name – A string or list of strings that specify the name or fully-qualified object identifier (database name, schema name, and function name) for the UDTF in Snowflake, which allows you to call this UDTF in a SQL command or via
call_udtf(). If it is not provided, a name will be automatically generated for the UDTF. A name must be specified whenis_permanentisTrue.is_permanent – Whether to create a permanent UDTF. The default is
False. If it isTrue, a validstage_locationmust be provided.stage_location – The stage location where the Python file for the UDTF and its dependencies should be uploaded. The stage location must be specified when
is_permanentisTrue, and it will be ignored whenis_permanentisFalse. It can be any stage other than temporary stages and external stages.imports – A list of imports that only apply to this UDTF. You can use a string to represent a file path (similar to the
pathargument inadd_import()) in this list, or a tuple of two strings to represent a file path and an import path (similar to theimport_pathargument inadd_import()). These UDTF-level imports will override the session-level imports added byadd_import().packages – A list of packages that only apply to this UDTF. These UDTF-level packages will override the session-level packages added by
add_packages()andadd_requirements(). To use Python packages that are not available in Snowflake, refer tocustom_package_usage_config().replace – Whether to replace a UDTF that already was registered. The default is
False. If it isFalse, attempting to register a UDTF with a name that already exists results in aSnowparkSQLExceptionexception being thrown. If it isTrue, an existing UDTF with the same name is overwritten.if_not_exists – Whether to skip creation of a UDTF when one with the same signature already exists. The default is
False.if_not_existsandreplaceare mutually exclusive and aValueErroris raised when both are set. If it isTrueand a UDTF with the same signature exists, the UDTF creation is skipped.session – Use this session to register the UDTF. If it’s not specified, the session that you created before calling this function will be used. You need to specify this parameter if you have created multiple sessions before calling this method.
parallel – The number of threads to use for uploading UDTF files with the PUT command. The default value is 4 and supported values are from 1 to 99. Increasing the number of threads can improve performance when uploading large UDTF files.
statement_params – Dictionary of statement level parameters to be set while executing this action.
strict – Whether the created UDTF is strict. A strict UDTF will not invoke the UDTF if any input is null. Instead, a null value will always be returned for that row. Note that the UDTF might still return null for non-null inputs.
secure – Whether the created UDTF is secure. For more information about secure functions, see Secure UDFs.
- Returns:
A UDTF function that can be called with
Columnexpressions.
Note
1. When type hints are provided and are complete for a function,
return_typeandinput_typesare optional and will be ignored. See details of supported data types for UDTFs inUDTFRegistration.You can use use
Variantto annotate a variant, and useGeographyorGeometryto annotate geospatial types when defining a UDTF.typing.Unionis not a valid type annotation for UDTFs, buttyping.Optionalcan be used to indicate the optional type.Type hints are not supported on functions decorated with decorators.
2. A temporary UDTF (when
is_permanentisFalse) is scoped to thissessionand all UDTF related files will be uploaded to a temporary session stage (session.get_session_stage()). For a permanent UDTF, these files will be uploaded to the stage that you specify.3. By default, UDTF registration fails if a function with the same name is already registered. Invoking
udtf()withreplaceset toTruewill overwrite the previously registered function.See also
Example:
>>> from snowflake.snowpark.types import IntegerType, StructField, StructType >>> class PrimeSieve: ... def process(self, n): ... is_prime = [True] * (n + 1) ... is_prime[0] = False ... is_prime[1] = False ... p = 2 ... while p * p <= n: ... if is_prime[p]: ... # set all multiples of p to False ... for i in range(p * p, n + 1, p): ... is_prime[i] = False ... p += 1 ... # yield all prime numbers ... for p in range(2, n + 1): ... if is_prime[p]: ... yield (p,) >>> prime_udtf = udtf(PrimeSieve, output_schema=StructType([StructField("number", IntegerType())]), input_types=[IntegerType()]) >>> session.table_function(prime_udtf(lit(20))).collect() [Row(NUMBER=2), Row(NUMBER=3), Row(NUMBER=5), Row(NUMBER=7), Row(NUMBER=11), Row(NUMBER=13), Row(NUMBER=17), Row(NUMBER=19)] Instead of calling `udtf` it is also possible to use udtf as a decorator.
Example:
>>> @udtf(name="alt_int",replace=True, output_schema=StructType([StructField("number", IntegerType())]), input_types=[IntegerType()]) ... class Alternator: ... def __init__(self): ... self._positive = True ... ... def process(self, n): ... for i in range(n): ... if self._positive: ... yield (1,) ... else: ... yield (-1,) ... self._positive = not self._positive >>> session.table_function("alt_int", lit(3)).collect() [Row(NUMBER=1), Row(NUMBER=-1), Row(NUMBER=1)] >>> session.table_function("alt_int", lit(2)).collect() [Row(NUMBER=1), Row(NUMBER=-1)] >>> session.table_function("alt_int", lit(1)).collect() [Row(NUMBER=1)]