modin.pandas.DataFrame.drop_duplicates¶
- DataFrame.drop_duplicates(subset=None, *, keep='first', inplace=False, ignore_index=False) Optional[DataFrame][source]¶
Return
DataFramewith duplicate rows removed.Considering certain columns is optional. Indexes, including time indexes are ignored.
- Parameters:
subset (column label or sequence of labels, optional) – Only consider certain columns for identifying duplicates, by default use all columns.
keep ({'first', 'last', False}, default 'first') – Determines which duplicates (if any) to keep. ‘first’ : Drop duplicates except for the first occurrence. ‘last’ : Drop duplicates except for the last occurrence. False : Drop all duplicates.
inplace (bool, default False) – Whether to modify the DataFrame rather than creating a new one.
ignore_index (bool, default False) – If True, the resulting axis will be labeled 0, 1, …, n - 1.
- Returns:
DataFrame with duplicates removed or None if inplace=True.
- Return type:
DataFrame or None
Examples
Consider dataset containing ramen rating.
By default, it removes duplicate rows based on all columns.
To remove duplicates on specific column(s), use subset.
To remove duplicates and keep last occurrences, use keep.