snowflake.snowpark.DataFrame.replace¶
- DataFrame.replace(to_replace: Union[None, bool, int, float, str, bytearray, Decimal, date, datetime, time, bytes, NaTType, float64, list, tuple, dict, Iterable[Union[None, bool, int, float, str, bytearray, Decimal, date, datetime, time, bytes, NaTType, float64, list, tuple, dict]], Dict[Union[None, bool, int, float, str, bytearray, Decimal, date, datetime, time, bytes, NaTType, float64, list, tuple, dict], Union[None, bool, int, float, str, bytearray, Decimal, date, datetime, time, bytes, NaTType, float64, list, tuple, dict]]], value: Union[None, bool, int, float, str, bytearray, Decimal, date, datetime, time, bytes, NaTType, float64, list, tuple, dict, Iterable[Union[None, bool, int, float, str, bytearray, Decimal, date, datetime, time, bytes, NaTType, float64, list, tuple, dict]]] = None, subset: Optional[Union[str, Iterable[str]]] = None) DataFrame [source]¶
Returns a new DataFrame that replaces values in the specified columns.
- Parameters:
to_replace – A scalar value, or a list of values or a
dict
that associates the original values with the replacement values. Ifto_replace
is adict
,value
andsubset
are ignored. To replace a null value, useNone
into_replace
. To replace a NaN value, usefloat("nan")
into_replace
. Ifto_replace
is empty, the method returns the original DataFrame.value – A scalar value, or a list of values for the replacement. If
value
is a list,value
should be of the same length asto_replace
. Ifvalue
is a scalar andto_replace
is a list, thenvalue
is used as a replacement for each item into_replace
.subset – A list of the names of columns in which the values should be replaced. If
cols
is not provided orNone
, the replacement will be applied to all columns. Ifcols
is empty, the method returns the original DataFrame.
Examples:
>>> df = session.create_dataframe([[1, 1.0, "1.0"], [2, 2.0, "2.0"]], schema=["a", "b", "c"]) >>> # replace 1 with 3 in all columns >>> df.na.replace(1, 3).show() ------------------- |"A" |"B" |"C" | ------------------- |3 |3.0 |1.0 | |2 |2.0 |2.0 | ------------------- >>> # replace 1 with 3 and 2 with 4 in all columns >>> df.na.replace([1, 2], [3, 4]).show() ------------------- |"A" |"B" |"C" | ------------------- |3 |3.0 |1.0 | |4 |4.0 |2.0 | ------------------- >>> # replace 1 with 3 and 2 with 3 in all columns >>> df.na.replace([1, 2], 3).show() ------------------- |"A" |"B" |"C" | ------------------- |3 |3.0 |1.0 | |3 |3.0 |2.0 | ------------------- >>> # the following line intends to replaces 1 with 3 and 2 with 4 in all columns >>> # and will give [Row(3, 3.0, "1.0"), Row(4, 4.0, "2.0")] >>> df.na.replace({1: 3, 2: 4}).show() ------------------- |"A" |"B" |"C" | ------------------- |3 |3.0 |1.0 | |4 |4.0 |2.0 | ------------------- >>> # the following line intends to replace 1 with "3" in column "a", >>> # but will be ignored since "3" (str) doesn't match the original data type >>> df.na.replace({1: "3"}, ["a"]).show() ------------------- |"A" |"B" |"C" | ------------------- |1 |1.0 |1.0 | |2 |2.0 |2.0 | -------------------
Note
If the type of a given value in
to_replace
orvalue
doesn’t match the column data type (e.g. afloat
forStringType
column), this replacement will be skipped in this column. Especially,int
can replace or be replaced in a column withFloatType
orDoubleType
, butfloat
cannot replace or be replaced in a column withIntegerType
orLongType
.None
can replace or be replaced in a column with any data type.
See also