You are viewing documentation about an older version (1.3.0). View latest version

snowflake.snowpark.DataFrameNaFunctions.drop¶

DataFrameNaFunctions.drop(how: str = 'any', thresh: int | None = None, subset: str | Iterable[str] | None = None) → DataFrame[source]¶

Returns a new DataFrame that excludes all rows containing fewer than a specified number of non-null and non-NaN values in the specified columns.

Parameters:

how – An str with value either ‘any’ or ‘all’. If ‘any’, drop a row if it contains any nulls. If ‘all’, drop a row only if all its values are null. The default value is ‘any’. If thresh is provided, how will be ignored.
thresh –
The minimum number of non-null and non-NaN values that should be in the specified columns in order for the row to be included. It overwrites how. In each case:
- If thresh is not provided or None, the length of subset will be used when how is ‘any’ and 1 will be used when how is ‘all’.
- If thresh is greater than the number of the specified columns, the method returns an empty DataFrame.
- If thresh is less than 1, the method returns the original DataFrame.
subset –
A list of the names of columns to check for null and NaN values. In each case:
- If subset is not provided or None, all columns will be included.
- If subset is empty, the method returns the original DataFrame.

Examples:

>>> df = session.create_dataframe([[1.0, 1], [float('nan'), 2], [None, 3], [4.0, None], [float('nan'), None]]).to_df("a", "b")
>>> # drop a row if it contains any nulls, with checking all columns
>>> df.na.drop().show()
-------------
|"A"  |"B"  |
-------------
|1.0  |1    |
-------------

>>> # drop a row only if all its values are null, with checking all columns
>>> df.na.drop(how='all').show()
---------------
|"A"   |"B"   |
---------------
|1.0   |1     |
|nan   |2     |
|NULL  |3     |
|4.0   |NULL  |
---------------

>>> # drop a row if it contains at least one non-null and non-NaN values, with checking all columns
>>> df.na.drop(thresh=1).show()
---------------
|"A"   |"B"   |
---------------
|1.0   |1     |
|nan   |2     |
|NULL  |3     |
|4.0   |NULL  |
---------------

>>> # drop a row if it contains any nulls, with checking column "a"
>>> df.na.drop(subset=["a"]).show()
--------------
|"A"  |"B"   |
--------------
|1.0  |1     |
|4.0  |NULL  |
--------------

>>> df.na.drop(subset="a").show()
--------------
|"A"  |"B"   |
--------------
|1.0  |1     |
|4.0  |NULL  |
--------------

Copy