snowflake.snowpark.DataFrame.replace¶

DataFrame.replace(to_replace: Union[None, bool, int, float, str, bytearray, Decimal, date, datetime, time, bytes, list, tuple, dict, Iterable[Union[None, bool, int, float, str, bytearray, Decimal, date, datetime, time, bytes, list, tuple, dict]], Dict[Union[None, bool, int, float, str, bytearray, Decimal, date, datetime, time, bytes, list, tuple, dict], Union[None, bool, int, float, str, bytearray, Decimal, date, datetime, time, bytes, list, tuple, dict]]], value: Union[None, bool, int, float, str, bytearray, Decimal, date, datetime, time, bytes, list, tuple, dict, Iterable[Union[None, bool, int, float, str, bytearray, Decimal, date, datetime, time, bytes, list, tuple, dict]]] = None, subset: Optional[Iterable[str]] = None) → DataFrame[source]¶

Returns a new DataFrame that replaces values in the specified columns.

Parameters:
  • to_replace – A scalar value, or a list of values or a dict that associates the original values with the replacement values. If to_replace is a dict, value and subset are ignored. To replace a null value, use None in to_replace. To replace a NaN value, use float("nan") in to_replace. If to_replace is empty, the method returns the original DataFrame.

  • value – A scalar value, or a list of values for the replacement. If value is a list, value should be of the same length as to_replace. If value is a scalar and to_replace is a list, then value is used as a replacement for each item in to_replace.

  • subset – A list of the names of columns in which the values should be replaced. If cols is not provided or None, the replacement will be applied to all columns. If cols is empty, the method returns the original DataFrame.

Examples:

>>> df = session.create_dataframe([[1, 1.0, "1.0"], [2, 2.0, "2.0"]], schema=["a", "b", "c"])
>>> # replace 1 with 3 in all columns
>>> df.na.replace(1, 3).show()
-------------------
|"A"  |"B"  |"C"  |
-------------------
|3    |3.0  |1.0  |
|2    |2.0  |2.0  |
-------------------

>>> # replace 1 with 3 and 2 with 4 in all columns
>>> df.na.replace([1, 2], [3, 4]).show()
-------------------
|"A"  |"B"  |"C"  |
-------------------
|3    |3.0  |1.0  |
|4    |4.0  |2.0  |
-------------------

>>> # replace 1 with 3 and 2 with 3 in all columns
>>> df.na.replace([1, 2], 3).show()
-------------------
|"A"  |"B"  |"C"  |
-------------------
|3    |3.0  |1.0  |
|3    |3.0  |2.0  |
-------------------

>>> # the following line intends to replaces 1 with 3 and 2 with 4 in all columns
>>> # and will give [Row(3, 3.0, "1.0"), Row(4, 4.0, "2.0")]
>>> df.na.replace({1: 3, 2: 4}).show()
-------------------
|"A"  |"B"  |"C"  |
-------------------
|3    |3.0  |1.0  |
|4    |4.0  |2.0  |
-------------------

>>> # the following line intends to replace 1 with "3" in column "a",
>>> # but will be ignored since "3" (str) doesn't match the original data type
>>> df.na.replace({1: "3"}, ["a"]).show()
-------------------
|"A"  |"B"  |"C"  |
-------------------
|1    |1.0  |1.0  |
|2    |2.0  |2.0  |
-------------------
Copy

Note

If the type of a given value in to_replace or value doesn’t match the column data type (e.g. a float for StringType column), this replacement will be skipped in this column. Especially,

  • int can replace or be replaced in a column with FloatType or DoubleType, but float cannot replace or be replaced in a column with IntegerType or LongType.

  • None can replace or be replaced in a column with any data type.