modin.pandas.Series.map¶
- Series.map(arg, na_action=None) Series [source]¶
Map values of Series according to an input mapping or function.
Used for substituting each value in a Series with another value, that may be derived from a function, a
dict
or aSeries
.- Parameters:
arg (function, collections.abc.Mapping subclass or Series) – Mapping correspondence. Only function is currently supported by Snowpark pandas.
na_action ({None, 'ignore'}, default None) – If ‘ignore’, propagate NULL values, without passing them to the mapping correspondence. Note that, it will not bypass NaN values in a FLOAT column in Snowflake. ‘ignore’ is currently not supported by Snowpark pandas.
- Returns:
Same index as caller.
- Return type:
See also
Series.apply
: For applying more complex functions on a Series.DataFrame.apply
: Apply a function row-/column-wise.DataFrame.applymap
: Apply a function elementwise on a whole DataFrame.Notes
When
arg
is a dictionary, values in Series that are not in the dictionary (as keys) are converted toNaN
. However, if the dictionary is adict
subclass that defines__missing__
(i.e. provides a method for default values), then this default is used rather thanNaN
.Examples
>>> s = pd.Series(['cat', 'dog', None, 'rabbit']) >>> s 0 cat 1 dog 2 None 3 rabbit dtype: object
map
accepts adict
or aSeries
. Values that are not found in thedict
are converted toNaN
, unless the dict has a default value (e.g.defaultdict
):>>> s.map({'cat': 'kitten', 'dog': 'puppy'}) 0 kitten 1 puppy 2 None 3 None dtype: object
It also accepts a function:
>>> s.map('I am a {}'.format) 0 I am a cat 1 I am a dog 2 I am a <NA> 3 I am a rabbit dtype: object
To avoid applying the function to missing values (and keep them as
NaN
)na_action='ignore'
can be used (Currently not supported by Snowpark pandas):>>> s.map('I am a {}'.format, na_action='ignore') 0 I am a cat 1 I am a dog 2 None 3 I am a rabbit dtype: object
Note that in the above example, the missing value in Snowflake is NULL, it is mapped to
None
in a string/object column.Snowpark pandas does not yet support dict subclasses other than collections.defaultdict that define a __missing__ method.
To generate a permanent UDF, pass a dictionary as the snowflake_udf_params argument to apply. The following example generates a permanent UDF named “permanent_double”:
>>> session.sql("CREATE STAGE sample_upload_stage").collect() >>> def double(x: str) -> str: ... return x * 2 ... >>> s.map(double, snowflake_udf_params={"name": "permanent_double", "stage_location": "@sample_upload_stage"}) 0 catcat 1 dogdog 2 None 3 rabbitrabbit dtype: object
You may also pass “replace” and “if_not_exists” in the dictionary to overwrite or re-use existing UDTFs.
With the “replace” flag:
>>> df.apply(double, snowflake_udf_params={ ... "name": "permanent_double", ... "stage_location": "@sample_upload_stage", ... "replace": True, ... })
With the “if_not_exists” flag:
>>> df.apply(double, snowflake_udf_params={ ... "name": "permanent_double", ... "stage_location": "@sample_upload_stage", ... "if_not_exists": True, ... })
Note that Snowpark pandas may still attempt to upload a new UDTF even when “if_not_exists” is passed; the generated SQL will just contain a CREATE FUNCTION IF NOT EXISTS query instead. Subsequent calls to apply within the same session may skip this query.
Passing the immutable keyword creates an immutable UDTF, which assumes that the UDTF will return the same result for the same inputs.
>>> df.apply(double, snowflake_udf_params={ ... "name": "permanent_double", ... "stage_location": "@sample_upload_stage", ... "replace": True, ... "immutable": True, ... })