modin.pandas.DataFrame.duplicated¶
- DataFrame.duplicated(subset=None, keep='first') Series[source]¶
Return boolean Series denoting duplicate rows.
Considering certain columns is optional.
- Parameters:
subset (column label or sequence of labels, optional) – Only consider certain columns for identifying duplicates, by default use all the columns.
keep ({'first', 'last', False}, default 'first') –
Determines which duplicates (if any) to mark.
first: Mark duplicates asTrueexcept for the first occurrence.last: Mark duplicates asTrueexcept for the last occurrence.False : Mark all duplicates as
True.
- Returns:
Boolean series for each duplicated rows.
- Return type:
See also
Index.duplicatedEquivalent method on index.
Series.duplicatedEquivalent method on Series.
Series.drop_duplicatesRemove duplicate values from Series.
DataFrame.drop_duplicatesRemove duplicate values from DataFrame.
Examples
Consider dataset containing ramen rating.
By default, for each set of duplicated values, the first occurrence is set on False and all others on True.
By using ‘last’, the last occurrence of each set of duplicated values is set on False and all others on True.
By setting
keepon False, all duplicates are True.To find duplicates on specific column(s), use
subset.