You are viewing documentation about an older version (1.3.0). View latest version

snowflake.snowpark.functions.pandas_udf¶

snowflake.snowpark.functions.pandas_udf(func: Callable | None = None, *, return_type: DataType | None = None, input_types: List[DataType] | None = None, name: str | Iterable[str] | None = None, is_permanent: bool = False, stage_location: str | None = None, imports: List[str | Tuple[str, str]] | None = None, packages: List[str | module] | None = None, replace: bool = False, if_not_exists: bool = False, session: Session | None = None, parallel: int = 4, max_batch_size: int | None = None, statement_params: Dict[str, str] | None = None, source_code_display: bool = True) → UserDefinedFunction | partial[source]¶

Registers a Python function as a vectorized UDF and returns the UDF. The arguments, return value and usage of this function are exactly the same as udf(), but this function can only be used for registering vectorized UDFs. See examples in UDFRegistration.

Example:

>>> from snowflake.snowpark.types import PandasSeriesType, PandasDataFrameType, IntegerType
>>> add_one_df_pandas_udf = pandas_udf(
...     lambda df: df[0] + df[1] + 1,
...     return_type=PandasSeriesType(IntegerType()),
...     input_types=[PandasDataFrameType([IntegerType(), IntegerType()])]
... )
>>> df = session.create_dataframe([[1, 2], [3, 4]], schema=["a", "b"])
>>> df.select(add_one_df_pandas_udf("a", "b").alias("result")).order_by("result").show()
------------
|"RESULT"  |
------------
|4         |
|8         |
------------
Copy

or as named Pandas UDFs that are accesible in the same session. Instead of calling pandas_udf as function, it can be also used as a decorator:

Example:

>>> from snowflake.snowpark.types import PandasSeriesType, PandasDataFrameType, IntegerType
>>> @pandas_udf(
...     return_type=PandasSeriesType(IntegerType()),
...     input_types=[PandasDataFrameType([IntegerType(), IntegerType()])],
... )
... def add_one_df_pandas_udf(df):
...     return df[0] + df[1] + 1
>>> df = session.create_dataframe([[1, 2], [3, 4]], schema=["a", "b"])
>>> df.select(add_one_df_pandas_udf("a", "b").alias("result")).order_by("result").show()
------------
|"RESULT"  |
------------
|4         |
|8         |
------------
Copy