snowflake.snowpark.DataFrameNaFunctions.replace¶
- DataFrameNaFunctions.replace(to_replace: Union[None, bool, int, float, str, bytearray, Decimal, date, datetime, time, bytes, NaTType, float64, list, tuple, dict, Iterable[Union[None, bool, int, float, str, bytearray, Decimal, date, datetime, time, bytes, NaTType, float64, list, tuple, dict]], Dict[Union[None, bool, int, float, str, bytearray, Decimal, date, datetime, time, bytes, NaTType, float64, list, tuple, dict], Union[None, bool, int, float, str, bytearray, Decimal, date, datetime, time, bytes, NaTType, float64, list, tuple, dict]]], value: Union[None, bool, int, float, str, bytearray, Decimal, date, datetime, time, bytes, NaTType, float64, list, tuple, dict, Iterable[Union[None, bool, int, float, str, bytearray, Decimal, date, datetime, time, bytes, NaTType, float64, list, tuple, dict]]] = None, subset: Optional[Union[str, Iterable[str]]] = None, *, include_decimal: bool = False) DataFrame[source]¶
Returns a new DataFrame that replaces values in the specified columns.
- Parameters:
to_replace – A scalar value, or a list of values or a
dictthat associates the original values with the replacement values. Ifto_replaceis adict,valueandsubsetare ignored. To replace a null value, useNoneinto_replace. To replace a NaN value, usefloat("nan")into_replace. Ifto_replaceis empty, the method returns the original DataFrame.value – A scalar value, or a list of values for the replacement. If
valueis a list,valueshould be of the same length asto_replace. Ifvalueis a scalar andto_replaceis a list, thenvalueis used as a replacement for each item into_replace.subset – A list of the names of columns in which the values should be replaced. If
colsis not provided orNone, the replacement will be applied to all columns. Ifcolsis empty, the method returns the original DataFrame.include_decimal – Whether to allow
Decimalvalues to replaceIntegerTypeandFloatTypevalues.
Examples:
>>> df = session.create_dataframe([[1, 1.0, "1.0"], [2, 2.0, "2.0"]], schema=["a", "b", "c"]) >>> # replace 1 with 3 in all columns >>> df.na.replace(1, 3).show() ------------------- |"A" |"B" |"C" | ------------------- |3 |3.0 |1.0 | |2 |2.0 |2.0 | ------------------- >>> # replace 1 with 3 and 2 with 4 in all columns >>> df.na.replace([1, 2], [3, 4]).show() ------------------- |"A" |"B" |"C" | ------------------- |3 |3.0 |1.0 | |4 |4.0 |2.0 | ------------------- >>> # replace 1 with 3 and 2 with 3 in all columns >>> df.na.replace([1, 2], 3).show() ------------------- |"A" |"B" |"C" | ------------------- |3 |3.0 |1.0 | |3 |3.0 |2.0 | ------------------- >>> # the following line intends to replaces 1 with 3 and 2 with 4 in all columns >>> # and will give [Row(3, 3.0, "1.0"), Row(4, 4.0, "2.0")] >>> df.na.replace({1: 3, 2: 4}).show() ------------------- |"A" |"B" |"C" | ------------------- |3 |3.0 |1.0 | |4 |4.0 |2.0 | ------------------- >>> # the following line intends to replace 1 with "3" in column "a", >>> # but will be ignored since "3" (str) doesn't match the original data type >>> df.na.replace({1: "3"}, ["a"]).show() ------------------- |"A" |"B" |"C" | ------------------- |1 |1.0 |1.0 | |2 |2.0 |2.0 | -------------------
Note
If the type of a given value in
to_replaceorvaluedoesn’t match the column data type (e.g. afloatforStringTypecolumn), this replacement will be skipped in this column. Especially,intcan replace or be replaced in a column withFloatTypeorDoubleType, butfloatcannot replace or be replaced in a column withIntegerTypeorLongType.Nonecan replace or be replaced in a column with any data type.
See also