Class DataFrameNaFunctions


  • public class DataFrameNaFunctions
    extends Object
    Provides functions for handling missing values in a DataFrame.
    Since:
    1.1.0
    • Method Detail

      • drop

        public DataFrame drop​(int minNonNullsPerRow,
                              String[] cols)
        Returns a new DataFrame that excludes all rows containing fewer than minNonNullsPerRow non-null and non-NaN values in the specified columns cols.

        If minNonNullsPerRow is greater than the number of the specified columns, the method returns an empty DataFrame. If minNonNullsPerRow is less than 1, the method returns the original DataFrame. If cols is empty, the method returns the original DataFrame.

        Parameters:
        minNonNullsPerRow - The minimum number of non-null and non-NaN values that should be in the specified columns in order for the row to be included.
        cols - A sequence of the names of columns to check for null and NaN values.
        Returns:
        A DataFrame
        Since:
        1.1.0
      • fill

        public DataFrame fill​(Map<String,​?> valueMap)
        Returns a new DataFrame that replaces all null and NaN values in the specified columns with the values provided.

        valueMap describes which columns will be replaced and what the replacement values are.

        It only supports Long, Int, short, byte, String, Boolean, float, and Double values. If the type of the given value doesn't match the column type (e.g. a Long value for a StringType column), the replacement in this column will be skipped.

        Parameters:
        valueMap - A Map that associates the names of columns with the values that should be used to replace null and NaN values in those columns.
        Returns:
        A DataFrame
        Since:
        1.1.0
      • replace

        public DataFrame replace​(String colName,
                                 Map<?,​?> replacement)
        Returns a new DataFrame that replaces values in a specified column.

        Use the replacement parameter to specify a Map that associates the values to replace with new values.

        For example, suppose that you pass `col1` for colName and Map(2 -> 3, None -> 2, 4 -> null) for replacement. In `col1`, this function replaces: `2` with `3`, null with `2`, `4` with null.

        Parameters:
        colName - The name of the column in which the values should be replaced.
        replacement - A Map that associates the original values with the replacement values.
        Returns:
        The result DataFrame
        Since:
        1.1.0