snowflake.snowpark.dataframe.map¶
- snowflake.snowpark.dataframe.map(dataframe: DataFrame, func: Callable, output_types: List[StructType], *, output_column_names: Optional[List[str]] = None, imports: Optional[List[Union[str, Tuple[str, str]]]] = None, packages: Optional[List[Union[str, module]]] = None, immutable: bool = False, partition_by: Optional[Union[Column, str, List[Union[Column, str]]]] = None, vectorized: bool = False, max_batch_size: Optional[int] = None)[source]¶
Returns a new DataFrame with the result of applying func to each of the rows of the specified DataFrame.
This function registers a temporary UDTF and returns a new DataFrame with the result of applying the func function to each row of the given DataFrame.
- Parameters:
dataframe – The DataFrame instance.
func – A function to be applied to every row of the DataFrame.
output_types – A list of types for values generated by the
funcoutput_column_names – A list of names to be assigned to the resulting columns.
imports – A list of imports that are required to run the function. This argument is passed on when registering the UDTF.
packages – A list of packages that are required to run the function. This argument is passed on when registering the UDTF.
immutable – A flag to specify if the result of the func is deterministic for the same input.
partition_by – Specify the partitioning column(s) for the UDTF.
vectorized – A flag to determine if the UDTF process should be vectorized. See vectorized UDTFs.
max_batch_size – The maximum number of rows per input pandas DataFrame when using vectorized option.
Example 1:
Example 2:
Example 3:
Example 4:
Example 5:
Example 6:
Note
1. The result of the func function must be either a scalar value or a tuple containing the same number of elements as specified in the output_types argument.
2. When using the vectorized option, the func function must accept a pandas DataFrame as input and return either a pandas DataFrame, or a tuple of pandas Series/arrays.