DataFrame |
DataFrame.agg(Column... exprs) |
Aggregate the data in the DataFrame.
|
DataFrame |
RelationalGroupedDataFrame.agg(Column... cols) |
Returns a DataFrame with aggregated computed according to the supplied Column expressions.
|
DataFrame |
DataFrame.alias(String alias) |
Returns the current DataFrame aliased as the input alias name.
|
DataFrame |
RelationalGroupedDataFrame.any_value(Column... cols) |
Returns non-deterministic values for the specified columns.
|
DataFrame |
RelationalGroupedDataFrame.avg(Column... cols) |
Return the average for the specified numeric columns.
|
DataFrame |
RelationalGroupedDataFrame.builtin(String aggName,
Column... cols) |
Computes the builtin aggregate 'aggName' over the specified columns.
|
DataFrame |
DataFrame.clone() |
Returns a clone of this DataFrame.
|
DataFrame |
RelationalGroupedDataFrame.count() |
Return the number of rows for each group.
|
DataFrame |
Session.createDataFrame(Row[] data,
StructType schema) |
Creates a new DataFrame that uses the specified schema and contains the specified Row objects.
|
DataFrame |
DataFrame.crossJoin(DataFrame right) |
Performs a cross join, which returns the cartesian product of the current DataFrame and another
DataFrame (`right`).
|
DataFrame |
DataFrameStatFunctions.crosstab(String col1,
String col2) |
Computes a pair-wise frequency table (a ''contingency table'') for the specified columns.
|
DataFrame |
DataFrame.distinct() |
Returns a new DataFrame that contains only the rows with distinct values from the current
DataFrame.
|
DataFrame |
DataFrame.drop(Column... columns) |
Returns a new DataFrame that excludes the columns with the specified names from the output.
|
DataFrame |
DataFrame.drop(String... columnNames) |
Returns a new DataFrame that excludes the columns with the specified names from the output.
|
DataFrame |
DataFrameNaFunctions.drop(int minNonNullsPerRow,
String[] cols) |
Returns a new DataFrame that excludes all rows containing fewer than minNonNullsPerRow
non-null and non-NaN values in the specified columns cols .
|
DataFrame |
DataFrame.dropDuplicates(String... colNames) |
Creates a new DataFrame by removing duplicated rows on given subset of columns.
|
DataFrame |
DataFrame.except(DataFrame other) |
Returns a new DataFrame that contains all the rows from the current DataFrame except for the
rows that also appear in another DataFrame (`other`).
|
DataFrame |
DataFrameNaFunctions.fill(Map<String,?> valueMap) |
Returns a new DataFrame that replaces all null and NaN values in the specified columns with the
values provided.
|
DataFrame |
DataFrame.filter(Column condition) |
Filters rows based on the specified conditional expression (similar to WHERE in SQL).
|
DataFrame |
DataFrame.flatten(Column input) |
Flattens (explodes) compound values into multiple rows (similar to the SQL FLATTEN
|
DataFrame |
DataFrame.flatten(Column input,
String path,
boolean outer,
boolean recursive,
String mode) |
Flattens (explodes) compound values into multiple rows (similar to the SQL FLATTEN
|
DataFrame |
Session.flatten(Column input) |
Creates a new DataFrame by flattening compound values into multiple rows.
|
DataFrame |
Session.flatten(Column input,
String path,
boolean outer,
boolean recursive,
String mode) |
Creates a new DataFrame by flattening compound values into multiple rows.
|
DataFrame |
Session.generator(long rowCount,
Column... columns) |
Creates a new DataFrame via Generator function.
|
DataFrame |
DataFrame.intersect(DataFrame other) |
Returns a new DataFrame that contains the intersection of rows from the current DataFrame and
another DataFrame (`other`).
|
DataFrame |
DataFrame.join(Column func) |
Joins the current DataFrame with the output of the specified table function `func`.
|
DataFrame |
DataFrame.join(Column func,
Column[] partitionBy,
Column[] orderBy) |
Joins the current DataFrame with the output of the specified table function `func`.
|
DataFrame |
DataFrame.join(DataFrame right) |
Performs a default inner join of the current DataFrame and another DataFrame (`right`).
|
DataFrame |
DataFrame.join(DataFrame right,
Column joinExpr) |
Performs a default inner join of the current DataFrame and another DataFrame (`right`) using
the join condition specified in an expression (`joinExpr`).
|
DataFrame |
DataFrame.join(DataFrame right,
Column joinExpr,
String joinType) |
Performs a join of the specified type (`joinType`) with the current DataFrame and another
DataFrame (`right`) using the join condition specified in an expression (`joinExpr`).
|
DataFrame |
DataFrame.join(DataFrame right,
String usingColumn) |
Performs a default inner join of the current DataFrame and another DataFrame (`right`) on a
column (`usingColumn`).
|
DataFrame |
DataFrame.join(DataFrame right,
String[] usingColumns) |
Performs a default inner join of the current DataFrame and another DataFrame (`right`) on a
list of columns (`usingColumns`).
|
DataFrame |
DataFrame.join(DataFrame right,
String[] usingColumns,
String joinType) |
Performs a join of the specified type (`joinType`) with the current DataFrame and another
DataFrame (`right`) on a list of columns (`usingColumns`).
|
DataFrame |
DataFrame.join(TableFunction func,
Column... args) |
Joins the current DataFrame with the output of the specified table function `func`.
|
DataFrame |
DataFrame.join(TableFunction func,
Column[] args,
Column[] partitionBy,
Column[] orderBy) |
Joins the current DataFrame with the output of the specified user-defined table function (UDTF)
`func`.
|
DataFrame |
DataFrame.join(TableFunction func,
Map<String,Column> args) |
Joins the current DataFrame with the output of the specified table function `func` that takes
named parameters (e.g.
|
DataFrame |
DataFrame.join(TableFunction func,
Map<String,Column> args,
Column[] partitionBy,
Column[] orderBy) |
Joins the current DataFrame with the output of the specified user-defined table function (UDTF)
`func`.
|
DataFrame |
DataFrame.limit(int n) |
Returns a new DataFrame that contains at most `n` rows from the current DataFrame (similar to
LIMIT in SQL).
|
DataFrame |
RelationalGroupedDataFrame.max(Column... cols) |
Return the max for the specified numeric columns.
|
DataFrame |
RelationalGroupedDataFrame.mean(Column... cols) |
Return the average for the specified numeric columns.
|
DataFrame |
RelationalGroupedDataFrame.median(Column... cols) |
Return the median for the specified numeric columns.
|
DataFrame |
RelationalGroupedDataFrame.min(Column... cols) |
Return the min for the specified numeric columns.
|
DataFrame |
DataFrame.naturalJoin(DataFrame right) |
Performs a natural join (a default inner join) of the current DataFrame and another DataFrame
(`right`).
|
DataFrame |
DataFrame.naturalJoin(DataFrame right,
String joinType) |
Performs a natural join of the specified type (`joinType`) with the current DataFrame and
another DataFrame (`right`).
|
DataFrame[] |
DataFrame.randomSplit(double[] weights) |
Randomly splits the current DataFrame into separate DataFrames, using the specified weights.
|
DataFrame |
Session.range(long end) |
Creates a new DataFrame from a range of numbers starting from 0.
|
DataFrame |
Session.range(long start,
long end) |
Creates a new DataFrame from a range of numbers.
|
DataFrame |
Session.range(long start,
long end,
long step) |
Creates a new DataFrame from a range of numbers.
|
DataFrame |
DataFrame.rename(String newName,
Column col) |
Returns a DataFrame with the specified column `col` renamed as `newName`.
|
DataFrame |
DataFrameNaFunctions.replace(String colName,
Map<?,?> replacement) |
Returns a new DataFrame that replaces values in a specified column.
|
DataFrame |
DataFrame.sample(double probabilityFraction) |
Returns a new DataFrame that contains a sampling of rows from the current DataFrame.
|
DataFrame |
DataFrame.sample(long num) |
Returns a new DataFrame with a sample of N rows from the underlying DataFrame.
|
DataFrame |
DataFrameStatFunctions.sampleBy(Column col,
Map<?,Double> fractions) |
Returns a DataFrame containing a stratified sample without replacement, based on a Map that
specifies the fraction for each stratum.
|
DataFrame |
DataFrameStatFunctions.sampleBy(String colName,
Map<?,Double> fractions) |
Returns a DataFrame containing a stratified sample without replacement, based on a Map that
specifies the fraction for each stratum.
|
DataFrame |
DataFrame.select(Column... columns) |
Generates a new DataFrame with the specified Column expressions as output (similar to SELECT in
SQL).
|
DataFrame |
DataFrame.select(String... columnNames) |
Returns a new DataFrame with a subset of named columns (similar to SELECT in SQL).
|
DataFrame |
DataFrame.sort(Column... sortExprs) |
Sorts a DataFrame by the specified expressions (similar to ORDER BY in SQL).
|
DataFrame |
Session.sql(String query) |
Returns a new DataFrame representing the results of a SQL query.
|
DataFrame |
Session.storedProcedure(StoredProcedure sp,
Object... args) |
Creates a new DataFrame from the given Stored Procedure and arguments.
|
DataFrame |
Session.storedProcedure(String spName,
Object... args) |
Creates a new DataFrame from the given Stored Procedure and arguments.
|
DataFrame |
RelationalGroupedDataFrame.sum(Column... cols) |
Return the sum for the specified numeric columns.
|
DataFrame |
DataFrameReader.table(String name) |
Returns a DataFrame that is set up to load data from the specified table.
|
DataFrame |
Session.tableFunction(Column func) |
Creates a new DataFrame from the given table function and arguments.
|
DataFrame |
Session.tableFunction(TableFunction func,
Column... args) |
Creates a new DataFrame from the given table function and arguments.
|
DataFrame |
Session.tableFunction(TableFunction func,
Map<String,Column> args) |
Creates a new DataFrame from the given table function and arguments.
|
DataFrame |
DataFrame.toDF(String... colNames) |
Creates a new DataFrame containing the data in the current DataFrame but in columns with the
specified names.
|
DataFrame |
DataFrame.union(DataFrame other) |
Returns a new DataFrame that contains all the rows in the current DataFrame and another
DataFrame (`other`), excluding any duplicate rows.
|
DataFrame |
DataFrame.unionAll(DataFrame other) |
Returns a new DataFrame that contains all the rows in the current DataFrame and another
DataFrame (`other`), including any duplicate rows.
|
DataFrame |
DataFrame.unionAllByName(DataFrame other) |
Returns a new DataFrame that contains all the rows in the current DataFrame and another
DataFrame (`other`), including any duplicate rows.
|
DataFrame |
DataFrame.unionByName(DataFrame other) |
Returns a new DataFrame that contains all the rows in the current DataFrame and another
DataFrame (`other`), excluding any duplicate rows.
|
DataFrame |
DataFrame.where(Column condition) |
Filters rows based on the specified conditional expression (similar to WHERE in SQL).
|
DataFrame |
DataFrame.withColumn(String colName,
Column col) |
Returns a DataFrame with an additional column with the specified name (`colName`).
|
DataFrame |
DataFrame.withColumns(String[] colNames,
Column[] values) |
Returns a DataFrame with additional columns with the specified names (`colNames`).
|