snowflake.snowpark.dataframe.map_in_pandas¶
- snowflake.snowpark.dataframe.map_in_pandas(dataframe: DataFrame, func: Callable, schema: Union[StructType, str], *, partition_by: Optional[Union[Column, str, List[Union[Column, str]]]] = None, imports: Optional[List[Union[str, Tuple[str, str]]]] = None, packages: Optional[List[Union[str, module]]] = None, immutable: bool = False, max_batch_size: Optional[int] = None)[source]¶
Returns a new DataFrame with the result of applying func to each batch of data in the dataframe. Func is expected to be a python function that takes an iterator of pandas DataFrames as both input and provides them as output. Number of input and output DataFrame batches can be different.
This function registers a temporary UDTF
- Parameters:
dataframe – The DataFrame instance.
func – A function to be applied to the batches of rows.
schema – A StructType or type string that represents the expected output schema of the func parameter.
partition_by – A column or list of columns that will be used to partition the data before passing it to the func.
imports – A list of imports that are required to run the function. This argument is passed on when registering the UDTF.
packages – A list of packages that are required to run the function. This argument is passed on when registering the UDTF.
immutable – A flag to specify if the result of the func is deterministic for the same input.
max_batch_size – The maximum number of rows per input pandas DataFrame when using vectorized option.
Example 1:
Example 2:
Example 3:
Example 4: