Class RelationalGroupedDataFrame


  • public class RelationalGroupedDataFrame
    extends Object
    Represents an underlying DataFrame with rows that are grouped by common values. Can be used to define aggregations on these grouped DataFrames.
    Since:
    0.9.0
    • Method Detail

      • agg

        public DataFrame agg​(Column... cols)
        Returns a DataFrame with aggregated computed according to the supplied Column expressions. Functions contains some built-in aggregate functions that can be used.
        Parameters:
        cols - The aggregate functions
        Returns:
        The result DataFrame
        Since:
        0.9.0
        See Also:
        Functions
      • avg

        public DataFrame avg​(Column... cols)
        Return the average for the specified numeric columns.
        Parameters:
        cols - The input column list
        Returns:
        The result DataFrame
        Since:
        1.1.0
      • mean

        public DataFrame mean​(Column... cols)
        Return the average for the specified numeric columns. Alias of avg
        Parameters:
        cols - The input column list
        Returns:
        The result DataFrame
        Since:
        1.1.0
      • sum

        public DataFrame sum​(Column... cols)
        Return the sum for the specified numeric columns.
        Parameters:
        cols - The input column list
        Returns:
        The result DataFrame
        Since:
        1.1.0
      • median

        public DataFrame median​(Column... cols)
        Return the median for the specified numeric columns.
        Parameters:
        cols - The input column list
        Returns:
        The result DataFrame
        Since:
        1.1.0
      • min

        public DataFrame min​(Column... cols)
        Return the min for the specified numeric columns.
        Parameters:
        cols - The input column list
        Returns:
        The result DataFrame
        Since:
        1.1.0
      • max

        public DataFrame max​(Column... cols)
        Return the max for the specified numeric columns.
        Parameters:
        cols - The input column list
        Returns:
        The result DataFrame
        Since:
        1.1.0
      • any_value

        public DataFrame any_value​(Column... cols)
        Returns non-deterministic values for the specified columns.
        Parameters:
        cols - The input column list
        Returns:
        The result DataFrame
        Since:
        1.1.0
      • count

        public DataFrame count()
        Return the number of rows for each group.
        Returns:
        The result DataFrame
        Since:
        1.1.0
      • builtin

        public DataFrame builtin​(String aggName,
                                 Column... cols)
        Computes the builtin aggregate 'aggName' over the specified columns. Use this function to invoke any aggregates not explicitly listed in this class.

        For example:

        
         df.groupBy("col1").builtin("max", df.col("col2"));
         
        Parameters:
        aggName - the Name of an aggregate function.
        cols - a list of function arguments.
        Returns:
        The result DataFrame
        Since:
        1.1.0