You are viewing documentation about an older version (1.9.0). View latest version

snowflake.snowpark.DataFrameNaFunctions.drop

DataFrameNaFunctions.drop(how: str = 'any', thresh: Optional[int] = None, subset: Optional[Union[str, Iterable[str]]] = None) DataFrame[source]

Returns a new DataFrame that excludes all rows containing fewer than a specified number of non-null and non-NaN values in the specified columns.

Parameters:
  • how – An str with value either ‘any’ or ‘all’. If ‘any’, drop a row if it contains any nulls. If ‘all’, drop a row only if all its values are null. The default value is ‘any’. If thresh is provided, how will be ignored.

  • thresh

    The minimum number of non-null and non-NaN values that should be in the specified columns in order for the row to be included. It overwrites how. In each case:

    • If thresh is not provided or None, the length of subset will be used when how is ‘any’ and 1 will be used when how is ‘all’.

    • If thresh is greater than the number of the specified columns, the method returns an empty DataFrame.

    • If thresh is less than 1, the method returns the original DataFrame.

  • subset

    A list of the names of columns to check for null and NaN values. In each case:

    • If subset is not provided or None, all columns will be included.

    • If subset is empty, the method returns the original DataFrame.

Examples:

>>> df = session.create_dataframe([[1.0, 1], [float('nan'), 2], [None, 3], [4.0, None], [float('nan'), None]]).to_df("a", "b")
>>> # drop a row if it contains any nulls, with checking all columns
>>> df.na.drop().show()
-------------
|"A"  |"B"  |
-------------
|1.0  |1    |
-------------

>>> # drop a row only if all its values are null, with checking all columns
>>> df.na.drop(how='all').show()
---------------
|"A"   |"B"   |
---------------
|1.0   |1     |
|nan   |2     |
|NULL  |3     |
|4.0   |NULL  |
---------------

>>> # drop a row if it contains at least one non-null and non-NaN values, with checking all columns
>>> df.na.drop(thresh=1).show()
---------------
|"A"   |"B"   |
---------------
|1.0   |1     |
|nan   |2     |
|NULL  |3     |
|4.0   |NULL  |
---------------

>>> # drop a row if it contains any nulls, with checking column "a"
>>> df.na.drop(subset=["a"]).show()
--------------
|"A"  |"B"   |
--------------
|1.0  |1     |
|4.0  |NULL  |
--------------

>>> df.na.drop(subset="a").show()
--------------
|"A"  |"B"   |
--------------
|1.0  |1     |
|4.0  |NULL  |
--------------
Copy