You are viewing documentation about an older version (1.18.0). View latest version

snowflake.snowpark.DataFrameNaFunctions.fill¶

DataFrameNaFunctions.fill(value: Union[None, bool, int, float, str, bytearray, Decimal, date, datetime, time, bytes, NaTType, float64, list, tuple, dict, Dict[str, Union[None, bool, int, float, str, bytearray, Decimal, date, datetime, time, bytes, NaTType, float64, list, tuple, dict]]], subset: Optional[Union[str, Iterable[str]]] = None) → DataFrame[source]¶

Returns a new DataFrame that replaces all null and NaN values in the specified columns with the values provided.

Parameters:
  • value – A scalar value or a dict that associates the names of columns with the values that should be used to replace null and NaN values in those columns. If value is a dict, subset is ignored. If value is an empty dict, the method returns the original DataFrame.

  • subset –

    A list of the names of columns to check for null and NaN values. In each case:

    • If subset is not provided or None, all columns will be included.

    • If subset is empty, the method returns the original DataFrame.

Examples:

>>> df = session.create_dataframe([[1.0, 1], [float('nan'), 2], [None, 3], [4.0, None], [float('nan'), None]]).to_df("a", "b")
>>> # fill null and NaN values in all columns
>>> df.na.fill(3.14).show()
---------------
|"A"   |"B"   |
---------------
|1.0   |1     |
|3.14  |2     |
|3.14  |3     |
|4.0   |NULL  |
|3.14  |NULL  |
---------------

>>> # fill null and NaN values in column "a"
>>> df.na.fill(3.14, subset="a").show()
---------------
|"A"   |"B"   |
---------------
|1.0   |1     |
|3.14  |2     |
|3.14  |3     |
|4.0   |NULL  |
|3.14  |NULL  |
---------------

>>> # fill null and NaN values in column "a"
>>> df.na.fill({"a": 3.14}).show()
---------------
|"A"   |"B"   |
---------------
|1.0   |1     |
|3.14  |2     |
|3.14  |3     |
|4.0   |NULL  |
|3.14  |NULL  |
---------------

>>> # fill null and NaN values in column "a" and "b"
>>> df.na.fill({"a": 3.14, "b": 15}).show()
--------------
|"A"   |"B"  |
--------------
|1.0   |1    |
|3.14  |2    |
|3.14  |3    |
|4.0   |15   |
|3.14  |15   |
--------------

>>> df2 = session.create_dataframe([[1.0, True], [2.0, False], [3.0, False], [None, None]]).to_df("a", "b")
>>> df2.na.fill(True).show()
----------------
|"A"   |"B"    |
----------------
|1.0   |True   |
|2.0   |False  |
|3.0   |False  |
|NULL  |True   |
----------------
Copy

Note

If the type of a given value in value doesn’t match the column data type (e.g. a float for StringType column), this replacement will be skipped in this column. Especially,