java.lang.Object
- com.snowflake.snowpark.internal.Logging
- - com.snowflake.snowpark_java.DataFrame

All Implemented Interfaces:

Cloneable

Direct Known Subclasses:

CopyableDataFrame, HasCachedResult, Updatable
```
public class DataFrame
extends com.snowflake.snowpark.internal.Logging
implements Cloneable
```
Represents a lazily-evaluated relational dataset that contains a collection of Row objects with columns defined by a schema (column name and type).
A DataFrame is considered lazy because it encapsulates the computation or query required to produce a relational dataset. The computation is not performed until you call a method that performs an action (e.g. collect).

Since:

0.8.0

Field Summary
- Fields inherited from class com.snowflake.snowpark.internal.Logging
  logName

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`DataFrame`	`agg(Column... exprs)`	Aggregate the data in the DataFrame.
`DataFrame`	`alias(String alias)`	Returns the current DataFrame aliased as the input alias name.
`DataFrameAsyncActor`	`async()`	Returns a DataFrameAsyncActor object that can be used to execute DataFrame actions asynchronously.
`HasCachedResult`	`cacheResult()`	Caches the content of this DataFrame to create a new cached DataFrame.
`DataFrame`	`clone()`	Returns a clone of this DataFrame.
`Column`	`col(String colName)`	Retrieves a reference to a column in this DataFrame.
`Row[]`	`collect()`	Executes the query representing this DataFrame and returns the result as an array of Row objects.
`long`	`count()`	Executes the query representing this DataFrame and returns the number of rows in the result (similar to the COUNT function in SQL).
`void`	`createOrReplaceTempView(String viewName)`	Creates a temporary view that returns the same results as this DataFrame.
`void`	`createOrReplaceTempView(String[] multipartIdentifier)`	Creates a temporary view that returns the same results as this DataFrame.
`void`	`createOrReplaceView(String viewName)`	Creates a view that captures the computation expressed by this DataFrame.
`void`	`createOrReplaceView(String[] multipartIdentifier)`	Creates a view that captures the computation expressed by this DataFrame.
`DataFrame`	`crossJoin(DataFrame right)`	Performs a cross join, which returns the cartesian product of the current DataFrame and another DataFrame (`right`).
`RelationalGroupedDataFrame`	`cube(Column... cols)`	Performs an SQL GROUP BY CUBE on the DataFrame.
`RelationalGroupedDataFrame`	`cube(String... colNames)`	Performs an SQL GROUP BY CUBE on the DataFrame.
`DataFrame`	`distinct()`	Returns a new DataFrame that contains only the rows with distinct values from the current DataFrame.
`DataFrame`	`drop(Column... columns)`	Returns a new DataFrame that excludes the columns with the specified names from the output.
`DataFrame`	`drop(String... columnNames)`	Returns a new DataFrame that excludes the columns with the specified names from the output.
`DataFrame`	`dropDuplicates(String... colNames)`	Creates a new DataFrame by removing duplicated rows on given subset of columns.
`DataFrame`	`except(DataFrame other)`	Returns a new DataFrame that contains all the rows from the current DataFrame except for the rows that also appear in another DataFrame (`other`).
`void`	`explain()`	Prints the list of queries that will be executed to evaluate this DataFrame.
`DataFrame`	`filter(Column condition)`	Filters rows based on the specified conditional expression (similar to WHERE in SQL).
`Optional<Row>`	`first()`	Executes the query representing this DataFrame and returns the first row of results.
`Row[]`	`first(int n)`	Executes the query representing this DataFrame and returns the first `n` rows of the results.
`DataFrame`	`flatten(Column input)`	Flattens (explodes) compound values into multiple rows (similar to the SQL FLATTEN
`DataFrame`	`flatten(Column input, String path, boolean outer, boolean recursive, String mode)`	Flattens (explodes) compound values into multiple rows (similar to the SQL FLATTEN
`RelationalGroupedDataFrame`	`groupBy(Column... cols)`	Groups rows by the columns specified by expressions (similar to GROUP BY in SQL).
`RelationalGroupedDataFrame`	`groupBy(String... colNames)`	Groups rows by the columns specified by name (similar to GROUP BY in SQL).
`RelationalGroupedDataFrame`	`groupByGroupingSets(GroupingSets... sets)`	Performs an SQL GROUP BY GROUPING SETS on the DataFrame.
`DataFrame`	`intersect(DataFrame other)`	Returns a new DataFrame that contains the intersection of rows from the current DataFrame and another DataFrame (`other`).
`DataFrame`	`join(Column func)`	Joins the current DataFrame with the output of the specified table function `func`.
`DataFrame`	`join(Column func, Column[] partitionBy, Column[] orderBy)`	Joins the current DataFrame with the output of the specified table function `func`.
`DataFrame`	`join(DataFrame right)`	Performs a default inner join of the current DataFrame and another DataFrame (`right`).
`DataFrame`	`join(DataFrame right, Column joinExpr)`	Performs a default inner join of the current DataFrame and another DataFrame (`right`) using the join condition specified in an expression (`joinExpr`).
`DataFrame`	`join(DataFrame right, Column joinExpr, String joinType)`	Performs a join of the specified type (`joinType`) with the current DataFrame and another DataFrame (`right`) using the join condition specified in an expression (`joinExpr`).
`DataFrame`	`join(DataFrame right, String usingColumn)`	Performs a default inner join of the current DataFrame and another DataFrame (`right`) on a column (`usingColumn`).
`DataFrame`	`join(DataFrame right, String[] usingColumns)`	Performs a default inner join of the current DataFrame and another DataFrame (`right`) on a list of columns (`usingColumns`).
`DataFrame`	`join(DataFrame right, String[] usingColumns, String joinType)`	Performs a join of the specified type (`joinType`) with the current DataFrame and another DataFrame (`right`) on a list of columns (`usingColumns`).
`DataFrame`	`join(TableFunction func, Column... args)`	Joins the current DataFrame with the output of the specified table function `func`.
`DataFrame`	`join(TableFunction func, Column[] args, Column[] partitionBy, Column[] orderBy)`	Joins the current DataFrame with the output of the specified user-defined table function (UDTF) `func`.
`DataFrame`	`join(TableFunction func, Map<String,Column> args)`	Joins the current DataFrame with the output of the specified table function `func` that takes named parameters (e.g.
`DataFrame`	`join(TableFunction func, Map<String,Column> args, Column[] partitionBy, Column[] orderBy)`	Joins the current DataFrame with the output of the specified user-defined table function (UDTF) `func`.
`DataFrame`	`limit(int n)`	Returns a new DataFrame that contains at most `n` rows from the current DataFrame (similar to LIMIT in SQL).
`DataFrameNaFunctions`	`na()`	Returns a `DataFrameNaFunctions` object that provides functions for handling missing values in the DataFrame.
`DataFrame`	`naturalJoin(DataFrame right)`	Performs a natural join (a default inner join) of the current DataFrame and another DataFrame (`right`).
`DataFrame`	`naturalJoin(DataFrame right, String joinType)`	Performs a natural join of the specified type (`joinType`) with the current DataFrame and another DataFrame (`right`).
`RelationalGroupedDataFrame`	`pivot(Column pivotColumn, Object[] values)`	Rotates this DataFrame by turning the unique values from one column in the input expression into multiple columns and aggregating results where required on any remaining column values.
`RelationalGroupedDataFrame`	`pivot(String pivotColumn, Object[] values)`	Rotates this DataFrame by turning the unique values from one column in the input expression into multiple columns and aggregating results where required on any remaining column values.
`DataFrame[]`	`randomSplit(double[] weights)`	Randomly splits the current DataFrame into separate DataFrames, using the specified weights.
`DataFrame`	`rename(String newName, Column col)`	Returns a DataFrame with the specified column `col` renamed as `newName`.
`RelationalGroupedDataFrame`	`rollup(Column... cols)`	Performs an SQL GROUP BY ROLLUP on the DataFrame.
`RelationalGroupedDataFrame`	`rollup(String... colNames)`	Performs an SQL GROUP BY ROLLUP on the DataFrame.
`DataFrame`	`sample(double probabilityFraction)`	Returns a new DataFrame that contains a sampling of rows from the current DataFrame.
`DataFrame`	`sample(long num)`	Returns a new DataFrame with a sample of N rows from the underlying DataFrame.
`StructType`	`schema()`	Retrieves the definition of the columns in this DataFrame (the "relational schema" for the DataFrame).
`DataFrame`	`select(Column... columns)`	Generates a new DataFrame with the specified Column expressions as output (similar to SELECT in SQL).
`DataFrame`	`select(String... columnNames)`	Returns a new DataFrame with a subset of named columns (similar to SELECT in SQL).
`void`	`show()`	Evaluates this DataFrame and prints out the first ten rows.
`void`	`show(int n)`	Evaluates this DataFrame and prints out the first `''n''` rows.
`void`	`show(int n, int maxWidth)`	Evaluates this DataFrame and prints out the first `''n''` rows with the specified maximum number of characters per column.
`DataFrame`	`sort(Column... sortExprs)`	Sorts a DataFrame by the specified expressions (similar to ORDER BY in SQL).
`DataFrameStatFunctions`	`stat()`	Returns a DataFrameStatFunctions object that provides statistic functions.
`DataFrame`	`toDF(String... colNames)`	Creates a new DataFrame containing the data in the current DataFrame but in columns with the specified names.
`Iterator<Row>`	`toLocalIterator()`	Executes the query representing this DataFrame and returns an iterator of Row objects that you can use to retrieve the results.
`DataFrame`	`union(DataFrame other)`	Returns a new DataFrame that contains all the rows in the current DataFrame and another DataFrame (`other`), excluding any duplicate rows.
`DataFrame`	`unionAll(DataFrame other)`	Returns a new DataFrame that contains all the rows in the current DataFrame and another DataFrame (`other`), including any duplicate rows.
`DataFrame`	`unionAllByName(DataFrame other)`	Returns a new DataFrame that contains all the rows in the current DataFrame and another DataFrame (`other`), including any duplicate rows.
`DataFrame`	`unionByName(DataFrame other)`	Returns a new DataFrame that contains all the rows in the current DataFrame and another DataFrame (`other`), excluding any duplicate rows.
`DataFrame`	`where(Column condition)`	Filters rows based on the specified conditional expression (similar to WHERE in SQL).
`DataFrame`	`withColumn(String colName, Column col)`	Returns a DataFrame with an additional column with the specified name (`colName`).
`DataFrame`	`withColumns(String[] colNames, Column[] values)`	Returns a DataFrame with additional columns with the specified names (`colNames`).
`DataFrameWriter`	`write()`	Returns a DataFrameWriter object that you can use to write the data in the DataFrame to any supported destination.

Methods inherited from class com.snowflake.snowpark.internal.Logging
log, logDebug, logDebug, logError, logError, logInfo, logInfo, logTrace, logTrace, logWarning, logWarning, maskSecrets, maskSecrets

Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Method Detail
  - schema
```
public StructType schema()
```
    Retrieves the definition of the columns in this DataFrame (the "relational schema" for the DataFrame).
    
    Returns:
    
    A StructType object representing the DataFrame's schema
    
    Since:
    
    0.9.0
  - cacheResult
```
public HasCachedResult cacheResult()
```
    Caches the content of this DataFrame to create a new cached DataFrame.
    All subsequent operations on the returned cached DataFrame are performed on the cached data and have no effect on the original DataFrame.
    
    Returns:
    
    A HasCachedResult
    
    Since:
    
    0.12.0
  - explain
```
public void explain()
```
    Prints the list of queries that will be executed to evaluate this DataFrame. Prints the query execution plan if only one SELECT/DML/DDL statement will be executed.
    For more information about the query execution plan, see the EXPLAIN command.
    
    Since:
    
    0.12.0
  - toDF
```
public DataFrame toDF(String... colNames)
```
    Creates a new DataFrame containing the data in the current DataFrame but in columns with the specified names. The number of column names that you pass in must match the number of columns in the current DataFrame.
    
    Parameters:
    
    colNames - A list of column names.
    
    Returns:
    
    A DataFrame
    
    Since:
    
    0.12.0
  - withColumn
```
public DataFrame withColumn(String colName,
                            Column col)
```
    Returns a DataFrame with an additional column with the specified name (`colName`). The column is computed by using the specified expression (`col`).
    If a column with the same name already exists in the DataFrame, that column is replaced by the new column.
    This example adds a new column named `mean_price` that contains the mean of the existing `price` column in the DataFrame.
    {{{ DataFrame dfWithMeanPriceCol = df.withColumn("mean_price", Functions.mean(df.col("price"))); }}}
    
    Parameters:
    
    colName - The name of the column to add or replace.
    
    col - The Column to add or replace.
    
    Returns:
    
    A DataFrame
    
    Since:
    
    0.9.0
  - withColumns
```
public DataFrame withColumns(String[] colNames,
                             Column[] values)
```
    Returns a DataFrame with additional columns with the specified names (`colNames`). The columns are computed by using the specified expressions (`cols`).
    If columns with the same names already exist in the DataFrame, those columns are replaced by the new columns.
    This example adds new columns named `mean_price` and `avg_price` that contain the mean and average of the existing `price` column.
    DataFrame dfWithAddedColumns = df.withColumns( new String[]{"mean_price", "avg_price"}, new Column[]{Functions.mean(df.col("price")), Functions.avg(df.col("price"))} );
    Parameters:
    
    colNames - A list of the names of the columns to add or replace.
    
    values - A list of the Column objects to add or replace.
    
    Returns:
    
    A DataFrame
    
    Since:
    
    0.12.0
  - rename
```
public DataFrame rename(String newName,
                        Column col)
```
    Returns a DataFrame with the specified column `col` renamed as `newName`.
    This example renames the column `A` as `NEW_A` in the DataFrame.
    DataFrame df = session.sql("select 1 as A, 2 as B"); DateFrame dfRenamed = df.rename("NEW_A", df.col("A"));
    Parameters:
    
    newName - The new name for the column
    
    col - The Column to be renamed
    
    Returns:
    
    A DataFrame
    
    Since:
    
    0.12.0
  - select
```
public DataFrame select(Column... columns)
```
    Generates a new DataFrame with the specified Column expressions as output (similar to SELECT in SQL). Only the Columns specified as arguments will be present in the resulting DataFrame.
    You can use any Column expression.
    For example:
    import com.snowflake.snowpark_java.Functions; DataFrame dfSelected = df.select(df.col("col1"), Functions.lit("abc"), df.col("col1").plus(df.col("col2")));
    Parameters:
    
    columns - The arguments of this select function
    
    Returns:
    
    The result DataFrame object
    
    Since:
    
    0.9.0
  - select
```
public DataFrame select(String... columnNames)
```
    Returns a new DataFrame with a subset of named columns (similar to SELECT in SQL).
    For example:
    DataFrame dfSelected = df.select("col1", "col2", "col3");
    
    Parameters:
    
    columnNames - A list of the names of columns to return.
    
    Returns:
    
    A DataFrame
    
    Since:
    
    0.12.0
  - drop
```
public DataFrame drop(Column... columns)
```
    Returns a new DataFrame that excludes the columns with the specified names from the output.
    This is functionally equivalent to calling select() and passing in all columns except the ones to exclude.
    
    Parameters:
    
    columns - An array of columns to exclude.
    
    Returns:
    
    A DataFrame
    
    Throws:
    
    com.snowflake.snowpark.SnowparkClientException - if the resulting DataFrame contains no output columns.
    
    Since:
    
    0.12.0
  - drop
```
public DataFrame drop(String... columnNames)
```
    Returns a new DataFrame that excludes the columns with the specified names from the output.
    This is functionally equivalent to calling select() and passing in all columns except the ones to exclude.
    
    Parameters:
    
    columnNames - An array of the names of columns to exclude.
    
    Returns:
    
    A DataFrame
    
    Throws:
    
    com.snowflake.snowpark.SnowparkClientException - if the resulting DataFrame contains no output columns.
    
    Since:
    
    0.12.0
  - filter
```
public DataFrame filter(Column condition)
```
    Filters rows based on the specified conditional expression (similar to WHERE in SQL).
    For example:
    import com.snowflake.snowpark_java.Functions; DataFrame dfFiltered = df.filter(df.col("colA").gt(Functions.lit(1)));
    Parameters:
    
    condition - The filter condition defined as an expression on columns
    
    Returns:
    
    A filtered DataFrame
    
    Since:
    
    0.9.0
  - where
```
public DataFrame where(Column condition)
```
    Filters rows based on the specified conditional expression (similar to WHERE in SQL). This is equivalent to calling filter function.
    For example:
    import com.snowflake.snowpark_java.Functions; DataFrame dfFiltered = df.where(df.col("colA").gt(Functions.lit(1)));
    Parameters:
    
    condition - The filter condition defined as an expression on columns
    
    Returns:
    
    A filtered DataFrame
    
    Since:
    
    0.9.0
  - agg
```
public DataFrame agg(Column... exprs)
```
    Aggregate the data in the DataFrame. Use this method if you don't need to group the data (`groupBy`).
    For the input value, pass in expressions that apply aggregation functions to columns (functions that are defined in the functions object).
    The following example calculates the maximum value of the `num_sales` column and the mean value of the `price` column:
    For example:
    df.agg(Functions.max(df.col("num_sales")), Functions.mean(df.col("price")))
    
    Parameters:
    
    exprs - A list of expressions on columns.
    
    Returns:
    
    A DataFrame
    
    Since:
    
    0.12.0
  - distinct
```
public DataFrame distinct()
```
    Returns a new DataFrame that contains only the rows with distinct values from the current DataFrame.
    This is equivalent to performing a SELECT DISTINCT in SQL.
    
    Returns:
    
    A DataFrame
    
    Since:
    
    0.12.0
  - dropDuplicates
```
public DataFrame dropDuplicates(String... colNames)
```
    Creates a new DataFrame by removing duplicated rows on given subset of columns. If no subset of columns specified, this function is same as distinct() function. The result is non-deterministic when removing duplicated rows from the subset of columns but not all columns. For example: Supposes we have a DataFrame `df`, which contains three rows (a, b, c): (1, 1, 1), (1, 1, 2), (1, 2, 3) The result of df.dropDuplicates("a", "b") can be either (1, 1, 1), (1, 2, 3) or (1, 1, 2), (1, 2, 3)
    
    Parameters:
    
    colNames - A list of column names
    
    Returns:
    
    A DataFrame
    
    Since:
    
    0.12.0
  - union
```
public DataFrame union(DataFrame other)
```
    Returns a new DataFrame that contains all the rows in the current DataFrame and another DataFrame (`other`), excluding any duplicate rows. Both input DataFrames must contain the same number of columns.
    
    Parameters:
    
    other - The other DataFrame that contains the rows to include.
    
    Returns:
    
    A DataFrame
    
    Since:
    
    0.9.0
  - unionAll
```
public DataFrame unionAll(DataFrame other)
```
    Returns a new DataFrame that contains all the rows in the current DataFrame and another DataFrame (`other`), including any duplicate rows. Both input DataFrames must contain the same number of columns.
    For example:
    DataFrame df1and2 = df1.unionAll(df2);
    
    Parameters:
    
    other - The other DataFrame that contains the rows to include.
    
    Returns:
    
    A DataFrame
    
    Since:
    
    0.12.0
  - unionByName
```
public DataFrame unionByName(DataFrame other)
```
    Returns a new DataFrame that contains all the rows in the current DataFrame and another DataFrame (`other`), excluding any duplicate rows.
    This method matches the columns in the two DataFrames by their names, not by their positions. The columns in the other DataFrame are rearranged to match the order of columns in the current DataFrame.
    
    Parameters:
    
    other - The other DataFrame that contains the rows to include.
    
    Returns:
    
    A DataFrame
    
    Since:
    
    0.12.0
  - unionAllByName
```
public DataFrame unionAllByName(DataFrame other)
```
    Returns a new DataFrame that contains all the rows in the current DataFrame and another DataFrame (`other`), including any duplicate rows.
    This method matches the columns in the two DataFrames by their names, not by their positions. The columns in the other DataFrame are rearranged to match the order of columns in the current DataFrame.
    
    Parameters:
    
    other - The other DataFrame that contains the rows to include.
    
    Returns:
    
    A DataFrame
    
    Since:
    
    0.12.0
  - intersect
```
public DataFrame intersect(DataFrame other)
```
    Returns a new DataFrame that contains the intersection of rows from the current DataFrame and another DataFrame (`other`). Duplicate rows are eliminated.
    
    Parameters:
    
    other - The other DataFrame that contains the rows to use for the intersection.
    
    Returns:
    
    A DataFrame
    
    Since:
    
    0.12.0
  - except
```
public DataFrame except(DataFrame other)
```
    Returns a new DataFrame that contains all the rows from the current DataFrame except for the rows that also appear in another DataFrame (`other`). Duplicate rows are eliminated.
    
    Parameters:
    
    other - The DataFrame that contains the rows to exclude.
    
    Returns:
    
    A DataFrame
    
    Since:
    
    0.12.0
  - clone
```
public DataFrame clone()
```
    Returns a clone of this DataFrame.
    
    Overrides:
    
    clone in class Object
    
    Returns:
    
    A DataFrame
    
    Since:
    
    0.12.0
  - join
```
public DataFrame join(DataFrame right)
```
    Performs a default inner join of the current DataFrame and another DataFrame (`right`).
    Because this method does not specify a join condition, the returned DataFrame is a cartesian product of the two DataFrames.
    If the current and `right` DataFrames have columns with the same name, and you need to refer to one of these columns in the returned DataFrame, use the col function on the current or `right` DataFrame to disambiguate references to these columns.
    
    Parameters:
    
    right - The other DataFrame to join.
    
    Returns:
    
    A DataFrame
    
    Since:
    
    0.1.0
  - join
```
public DataFrame join(DataFrame right,
                      String usingColumn)
```
    Performs a default inner join of the current DataFrame and another DataFrame (`right`) on a column (`usingColumn`).
    The method assumes that the `usingColumn` column has the same meaning in the left and right DataFrames.
    For example: left.join(right, "col")
    
    Parameters:
    
    right - The other DataFrame to join.
    
    usingColumn - The name of the column to use for the join.
    
    Returns:
    
    A DataFrame
    
    Since:
    
    0.12.0
  - join
```
public DataFrame join(DataFrame right,
                      String[] usingColumns)
```
    Performs a default inner join of the current DataFrame and another DataFrame (`right`) on a list of columns (`usingColumns`).
    The method assumes that the columns in `usingColumns` have the same meaning in the left and right DataFrames.
    For example: left.join(right, new String[]{"col1", "col2"})
    
    Parameters:
    
    right - The other DataFrame to join.
    
    usingColumns - A list of the names of the columns to use for the join.
    
    Returns:
    
    A DataFrame
    
    Since:
    
    0.12.0
  - join
```
public DataFrame join(DataFrame right,
                      String[] usingColumns,
                      String joinType)
```
    Performs a join of the specified type (`joinType`) with the current DataFrame and another DataFrame (`right`) on a list of columns (`usingColumns`).
    The method assumes that the columns in `usingColumns` have the same meaning in the left and right DataFrames.
    For example:
    left.join(right, new String[]{"col"}, "left"); left.join(right, new String[]{"col1", "col2}, "outer");
    Parameters:
    
    right - The other DataFrame to join.
    
    usingColumns - A list of the names of the columns to use for the join.
    
    joinType - The type of join (e.g. "right", "outer", etc.).
    
    Returns:
    
    A DataFrame
    
    Since:
    
    0.12.0
  - join
```
public DataFrame join(DataFrame right,
                      Column joinExpr)
```
    Performs a default inner join of the current DataFrame and another DataFrame (`right`) using the join condition specified in an expression (`joinExpr`).
    To disambiguate columns with the same name in the left DataFrame and right DataFrame, use the col() method of each DataFrame. You can use this approach to disambiguate columns in the `joinExpr` parameter and to refer to columns in the returned DataFrame.
    For example: df1.join(df2, df1.col("col1").equal_to(df2.col("col2")))
    If you need to join a DataFrame with itself, keep in mind that there is no way to distinguish between columns on the left and right sides in a join expression. For example:
    df.join(df, df.col("a").equal_to(df.col("b"))) As a workaround, you can either construct the left and right DataFrames separately, or you can call a join(DataFrame, String[]) method that allows you to pass in 'usingColumns' parameter.
    
    Parameters:
    
    right - The other DataFrame to join.
    
    joinExpr - Expression that specifies the join condition.
    
    Returns:
    
    A DataFrame
    
    Since:
    
    0.12.0
  - join
```
public DataFrame join(DataFrame right,
                      Column joinExpr,
                      String joinType)
```
    Performs a join of the specified type (`joinType`) with the current DataFrame and another DataFrame (`right`) using the join condition specified in an expression (`joinExpr`).
    To disambiguate columns with the same name in the left DataFrame and right DataFrame, use the col() method of each DataFrame. You can use this approach to disambiguate columns in the `joinExpr` parameter and to refer to columns in the returned DataFrame.
    For example: df1.join(df2, df1.col("col1").equal_to(df2.col("col2")))
    If you need to join a DataFrame with itself, keep in mind that there is no way to distinguish between columns on the left and right sides in a join expression. For example:
    df.join(df, df.col("a").equal_to(df.col("b"))) As a workaround, you can either construct the left and right DataFrames separately, or you can call a join(DataFrame, String[]) method that allows you to pass in 'usingColumns' parameter.
    
    Parameters:
    
    right - The other DataFrame to join.
    
    joinExpr - Expression that specifies the join condition.
    
    joinType - The type of join (e.g. "right", "outer", etc.).
    
    Returns:
    
    A DataFrame
    
    Since:
    
    0.12.0
  - crossJoin
```
public DataFrame crossJoin(DataFrame right)
```
    Performs a cross join, which returns the cartesian product of the current DataFrame and another DataFrame (`right`).
    If the current and `right` DataFrames have columns with the same name, and you need to refer to one of these columns in the returned DataFrame, use the col function on the current or `right` DataFrame to disambiguate references to these columns.
    
    Parameters:
    
    right - The other DataFrame to join.
    
    Returns:
    
    A DataFrame
    
    Since:
    
    0.12.0
  - naturalJoin
```
public DataFrame naturalJoin(DataFrame right)
```
    Performs a natural join (a default inner join) of the current DataFrame and another DataFrame (`right`).
    
    Parameters:
    
    right - The other DataFrame to join.
    
    Returns:
    
    A DataFrame
    
    Since:
    
    0.12.0
  - naturalJoin
```
public DataFrame naturalJoin(DataFrame right,
                             String joinType)
```
    Performs a natural join of the specified type (`joinType`) with the current DataFrame and another DataFrame (`right`).
    
    Parameters:
    
    right - The other DataFrame to join.
    
    joinType - The type of join (e.g. "right", "outer", etc.).
    
    Returns:
    
    A DataFrame
    
    Since:
    
    0.12.0
  - sort
```
public DataFrame sort(Column... sortExprs)
```
    Sorts a DataFrame by the specified expressions (similar to ORDER BY in SQL).
    For example:
    DataFrame dfSorted = df.sort(df.col("colA"), df.col("colB").desc);
    Parameters:
    
    sortExprs - A list of Column expressions for sorting the DataFrame
    
    Returns:
    
    The sorted DataFrame
    
    Since:
    
    0.9.0
  - limit
```
public DataFrame limit(int n)
```
    Returns a new DataFrame that contains at most `n` rows from the current DataFrame (similar to LIMIT in SQL).
    Note that this is a transformation method and not an action method.
    
    Parameters:
    
    n - Number of rows to return.
    
    Returns:
    
    A DataFrame
    
    Since:
    
    0.12.0
  - groupBy
```
public RelationalGroupedDataFrame groupBy(Column... cols)
```
    Groups rows by the columns specified by expressions (similar to GROUP BY in SQL).
    
    Parameters:
    
    cols - An array of expressions on columns.
    
    Returns:
    
    A RelationalGroupedDataFrame that you can use to perform aggregations on each group of data.
    
    Since:
    
    0.9.0
  - groupBy
```
public RelationalGroupedDataFrame groupBy(String... colNames)
```
    Groups rows by the columns specified by name (similar to GROUP BY in SQL).
    This method returns a RelationalGroupedDataFrame that you can use to perform aggregations on each group of data.
    
    Parameters:
    
    colNames - A list of the names of columns to group by.
    
    Returns:
    
    A RelationalGroupedDataFrame that you can use to perform aggregations on each group of data.
    
    Since:
    
    1.1.0
  - rollup
```
public RelationalGroupedDataFrame rollup(Column... cols)
```
    Performs an SQL GROUP BY ROLLUP on the DataFrame.
    
    Parameters:
    
    cols - A list of expressions on columns.
    
    Returns:
    
    A RelationalGroupedDataFrame that you can use to perform aggregations on each group of data.
    
    Since:
    
    1.1.0
  - rollup
```
public RelationalGroupedDataFrame rollup(String... colNames)
```
    Performs an SQL GROUP BY ROLLUP on the DataFrame.
    
    Parameters:
    
    colNames - A list of column names.
    
    Returns:
    
    A RelationalGroupedDataFrame that you can use to perform aggregations on each group of data.
    
    Since:
    
    1.1.0
  - cube
```
public RelationalGroupedDataFrame cube(Column... cols)
```
    Performs an SQL GROUP BY CUBE on the DataFrame.
    
    Parameters:
    
    cols - A list of expressions for columns to use.
    
    Returns:
    
    A RelationalGroupedDataFrame
    
    Since:
    
    0.9.0
  - cube
```
public RelationalGroupedDataFrame cube(String... colNames)
```
    Performs an SQL GROUP BY CUBE on the DataFrame.
    
    Parameters:
    
    colNames - A list of column names.
    
    Returns:
    
    A RelationalGroupedDataFrame
    
    Since:
    
    1.1.0
  - groupByGroupingSets
```
public RelationalGroupedDataFrame groupByGroupingSets(GroupingSets... sets)
```
    Performs an SQL GROUP BY GROUPING SETS on the DataFrame.
    GROUP BY GROUPING SETS is an extension of the GROUP BY clause that allows computing multiple group-by clauses in a single statement. The group set is a set of dimension columns.
    GROUP BY GROUPING SETS is equivalent to the UNION of two or more GROUP BY operations in the same result set:
    df.groupByGroupingSets(GroupingSets.create(Set.of(df.col("a")))) is equivalent to df.groupBy("a")
    and
    df.groupByGroupingSets(GroupingSets.create(Set.of(df.col("a")), Set.of(df.col("b")))) is equivalent to df.groupBy("a") 'union' df.groupBy("b")
    
    Parameters:
    
    sets - A list of GroupingSets objects.
    
    Returns:
    
    A RelationalGroupedDataFrame that you can use to perform aggregations on each group of data.
    
    Since:
    
    1.1.0
  - pivot
```
public RelationalGroupedDataFrame pivot(Column pivotColumn,
                                        Object[] values)
```
    Rotates this DataFrame by turning the unique values from one column in the input expression into multiple columns and aggregating results where required on any remaining column values.
    Only one aggregate is supported with pivot.
    For example:
    DataFrame dfPivoted = df.pivot(df.col("col1"), new int[]{1, 2, 3}) .agg(sum(df.col("col2")));
    Parameters:
    
    pivotColumn - The name of the column to use.
    
    values - An array of values in the column.
    
    Returns:
    
    A RelationalGroupedDataFrame
    
    Since:
    
    1.2.0
  - pivot
```
public RelationalGroupedDataFrame pivot(String pivotColumn,
                                        Object[] values)
```
    Rotates this DataFrame by turning the unique values from one column in the input expression into multiple columns and aggregating results where required on any remaining column values.
    Only one aggregate is supported with pivot.
    For example:
    DataFrame dfPivoted = df.pivot("col1", new int[]{1, 2, 3}) .agg(sum(df.col("col2")));
    Parameters:
    
    pivotColumn - The name of the column to use.
    
    values - An array of values in the column.
    
    Returns:
    
    A RelationalGroupedDataFrame
    
    Since:
    
    1.2.0
  - count
```
public long count()
```
    Executes the query representing this DataFrame and returns the number of rows in the result (similar to the COUNT function in SQL). This is an action function.
    
    Returns:
    
    The number of rows.
    
    Since:
    
    0.8.0
  - col
```
public Column col(String colName)
```
    Retrieves a reference to a column in this DataFrame.
    
    Parameters:
    
    colName - The name of the column
    
    Returns:
    
    The target column
    
    Since:
    
    0.9.0
  - alias
```
public DataFrame alias(String alias)
```
    Returns the current DataFrame aliased as the input alias name.
    For example:
    {{{ val df2 = df.alias("A") df2.select(df2.col("A.num")) }}}
    
    Parameters:
    
    alias - The alias name of the dataframe
    
    Returns:
    
    a [[DataFrame]]
    
    Since:
    
    1.10.0
  - collect
```
public Row[] collect()
```
    Executes the query representing this DataFrame and returns the result as an array of Row objects.
    
    Returns:
    
    The result array
    
    Since:
    
    0.9.0
  - toLocalIterator
```
public Iterator<Row> toLocalIterator()
```
    Executes the query representing this DataFrame and returns an iterator of Row objects that you can use to retrieve the results.
    Unlike the collect method, this method does not load all data into memory at once.
    
    Returns:
    
    An Iterator of Row
    
    Since:
    
    0.12.0
  - show
```
public void show()
```
    Evaluates this DataFrame and prints out the first ten rows.
    
    Since:
    
    0.9.0
  - show
```
public void show(int n)
```
    Evaluates this DataFrame and prints out the first `''n''` rows.
    
    Parameters:
    
    n - The number of rows to print out.
    
    Since:
    
    0.12.0
  - show
```
public void show(int n,
                 int maxWidth)
```
    Evaluates this DataFrame and prints out the first `''n''` rows with the specified maximum number of characters per column.
    
    Parameters:
    
    n - The number of rows to print out.
    
    maxWidth - The maximum number of characters to print out for each column. If the number of characters exceeds the maximum, the method prints out an ellipsis (...) at the end of the column.
    
    Since:
    
    0.12.0
  - createOrReplaceView
```
public void createOrReplaceView(String viewName)
```
    Creates a view that captures the computation expressed by this DataFrame.
    For `viewName`, you can include the database and schema name (i.e. specify a fully-qualified name). If no database name or schema name are specified, the view will be created in the current database or schema.
    `viewName` must be a valid Snowflake identifier
    
    Parameters:
    
    viewName - The name of the view to create or replace.
    
    Since:
    
    0.12.0
  - createOrReplaceView
```
public void createOrReplaceView(String[] multipartIdentifier)
```
    Creates a view that captures the computation expressed by this DataFrame.
    In `multipartIdentifer`, you can include the database and schema name to specify a fully-qualified name. If no database name or schema name are specified, the view will be created in the current database or schema.
    The view name must be a valid Snowflake identifier
    
    Parameters:
    
    multipartIdentifier - A sequence of strings that specifies the database name, schema name, and view name.
    
    Since:
    
    0.12.0
  - createOrReplaceTempView
```
public void createOrReplaceTempView(String viewName)
```
    Creates a temporary view that returns the same results as this DataFrame.
    You can use the view in subsequent SQL queries and statements during the current session. The temporary view is only available in the session in which it is created.
    For `viewName`, you can include the database and schema name (i.e. specify a fully-qualified name). If no database name or schema name are specified, the view will be created in the current database or schema.
    `viewName` must be a valid Snowflake identifier
    
    Parameters:
    
    viewName - The name of the view to create or replace.
    
    Since:
    
    0.12.0
  - createOrReplaceTempView
```
public void createOrReplaceTempView(String[] multipartIdentifier)
```
    Creates a temporary view that returns the same results as this DataFrame.
    You can use the view in subsequent SQL queries and statements during the current session. The temporary view is only available in the session in which it is created.
    In `multipartIdentifer`, you can include the database and schema name to specify a fully-qualified name. If no database name or schema name are specified, the view will be created in the current database or schema.
    The view name must be a valid Snowflake identifier
    
    Parameters:
    
    multipartIdentifier - A sequence of strings that specify the database name, schema name, and view name.
    
    Since:
    
    0.12.0
  - first
```
public Optional<Row> first()
```
    Executes the query representing this DataFrame and returns the first row of results.
    
    Returns:
    
    An Optional Row.
    
    Since:
    
    0.12.0
  - first
```
public Row[] first(int n)
```
    Executes the query representing this DataFrame and returns the first n rows of the results.
    
    Parameters:
    
    n - The number of rows to return.
    
    Returns:
    
    An Array of the first n Row objects. If n is negative or larger than the number of rows in the results, returns all rows in the results.
    
    Since:
    
    0.12.0
  - sample
```
public DataFrame sample(long num)
```
    Returns a new DataFrame with a sample of N rows from the underlying DataFrame.
    NOTE:
    - If the row count in the DataFrame is larger than the requested number of rows, the method returns a DataFrame containing the number of requested rows. - If the row count in the DataFrame is smaller than the requested number of rows, the method returns a DataFrame containing all rows.
    
    Parameters:
    
    num - The number of rows to sample in the range of 0 to 1,000,000.
    
    Returns:
    
    A DataFrame containing the sample of num rows.
    
    Since:
    
    0.12.0
  - sample
```
public DataFrame sample(double probabilityFraction)
```
    Returns a new DataFrame that contains a sampling of rows from the current DataFrame.
    NOTE:
    - The number of rows returned may be close to (but not exactly equal to) (probabilityFraction * totalRowCount). - The Snowflake SAMPLE supports specifying 'probability' as a percentage number. The range of 'probability' is [0.0, 100.0]. The conversion formula is probability = probabilityFraction * 100.
    
    Parameters:
    
    probabilityFraction - The fraction of rows to sample. This must be in the range of `0.0` to `1.0`.
    
    Returns:
    
    A DataFrame containing the sample of rows.
    
    Since:
    
    0.12.0
  - randomSplit
```
public DataFrame[] randomSplit(double[] weights)
```
    Randomly splits the current DataFrame into separate DataFrames, using the specified weights.
    NOTE:
    - If only one weight is specified, the returned DataFrame array only includes the current DataFrame. - If multiple weights are specified, the current DataFrame will be cached before being split.
    
    Parameters:
    
    weights - Weights to use for splitting the DataFrame. If the weights don't add up to 1, the weights will be normalized.
    
    Returns:
    
    A list of DataFrame objects
    
    Since:
    
    0.12.0
  - flatten
```
public DataFrame flatten(Column input)
```
    Flattens (explodes) compound values into multiple rows (similar to the SQL FLATTEN
    The `flatten` method adds the following columns to the returned DataFrame:
    - SEQ - KEY - PATH - INDEX - VALUE - THIS
    If this DataFrame also has columns with the names above, you can disambiguate the columns by using the this("value") syntax.
    For example, if the current DataFrame has a column named `value`:
    DataFrame df = session.sql("select parse_json(value) as value from values('[1,2]') as T(value)"); DataFrame flattened = df.flatten(df.col("value")); flattened.select(df.col("value"), flattened("value").as("newValue")).show();
    Parameters:
    
    input - The expression that will be unseated into rows. The expression must be of data type VARIANT, OBJECT, or ARRAY.
    
    Returns:
    
    A DataFrame containing the flattened values.
    
    Since:
    
    0.12.0
  - flatten
```
public DataFrame flatten(Column input,
                         String path,
                         boolean outer,
                         boolean recursive,
                         String mode)
```
    Flattens (explodes) compound values into multiple rows (similar to the SQL FLATTEN
    The `flatten` method adds the following columns to the returned DataFrame:
    - SEQ - KEY - PATH - INDEX - VALUE - THIS
    If this DataFrame also has columns with the names above, you can disambiguate the columns by using the this("value") syntax.
    For example, if the current DataFrame has a column named `value`:
    DataFrame df = session.sql("select parse_json(value) as value from values('[1,2]') as T(value)"); DataFrame flattened = df.flatten(df.col("value"), "", false, false, "both"); flattened.select(df.col("value"), flattened("value").as("newValue")).show();
    Parameters:
    
    input - The expression that will be unseated into rows. The expression must be of data type VARIANT, OBJECT, or ARRAY.
    
    path - The path to the element within a VARIANT data structure which needs to be flattened. Can be a zero-length string (i.e. empty path) if the outermost element is to be flattened.
    
    outer - If FALSE, any input rows that cannot be expanded, either because they cannot be accessed in the path or because they have zero fields or entries, are completely omitted from the output. Otherwise, exactly one row is generated for zero-row expansions (with NULL in the KEY, INDEX, and VALUE columns).
    
    recursive - If FALSE, only the element referenced by PATH is expanded. Otherwise, the expansion is performed for all sub-elements recursively.
    
    mode - Specifies whether only OBJECT, ARRAY, or BOTH should be flattened.
    
    Returns:
    
    A DataFrame containing the flattened values.
    
    Since:
    
    0.12.0
  - write
```
public DataFrameWriter write()
```
    Returns a DataFrameWriter object that you can use to write the data in the DataFrame to any supported destination. The default SaveMode for the returned DataFrameWriter is SaveMode.Append.
    Example:
    df.write().saveAsTable("table1");
    Returns:
    
    A DataFrameWriter
    
    Since:
    
    1.1.0
  - na
```
public DataFrameNaFunctions na()
```
    Returns a DataFrameNaFunctions object that provides functions for handling missing values in the DataFrame.
    
    Returns:
    
    The DataFrameNaFunctions
    
    Since:
    
    1.1.0
  - stat
```
public DataFrameStatFunctions stat()
```
    Returns a DataFrameStatFunctions object that provides statistic functions.
    
    Returns:
    
    The DataFrameStatFunctions
    
    Since:
    
    1.1.0
  - async
```
public DataFrameAsyncActor async()
```
    Returns a DataFrameAsyncActor object that can be used to execute DataFrame actions asynchronously.
    
    Returns:
    
    A DataFrameAsyncActor object
    
    Since:
    
    1.2.0
  - join
```
public DataFrame join(TableFunction func,
                      Column... args)
```
    Joins the current DataFrame with the output of the specified table function `func`.
    To pass arguments to the table function, use the `args` arguments of this method. In the table function arguments, you can include references to columns in this DataFrame.
    For example:
    // The following example uses the split_to_table function to split // column 'a' in this DataFrame on the character ','. // Each row in the current DataFrame will produce N rows in the resulting DataFrame, // where N is the number of tokens in the column 'a'. df.join(TableFunctions.split_to_table(), df.col("a"), Functions.lit(","))
    Parameters:
    
    func - TableFunction object, which can be one of the values in the TableFunctions class or an object that you create from the TableFunction class.
    
    args - The functions arguments
    
    Returns:
    
    The result DataFrame
    
    Since:
    
    1.2.0
  - join
```
public DataFrame join(TableFunction func,
                      Column[] args,
                      Column[] partitionBy,
                      Column[] orderBy)
```
    Joins the current DataFrame with the output of the specified user-defined table function (UDTF) `func`.
    To pass arguments to the table function, use the `args` argument of this method. In the table function arguments, you can include references to columns in this DataFrame.
    To specify a PARTITION BY or ORDER BY clause, use the `partitionBy` and `orderBy` arguments.
    For example
    // The following example passes the values in the column `col1` to the // user-defined tabular function (UDTF) `udtf`, partitioning the // data by `col2` and sorting the data by `col1`. The example returns // a new DataFrame that joins the contents of the current DataFrame with // the output of the UDTF. df.join(new TableFunction("udtf"), new Column[] {df.col("col1")}, new Column[] {df.col("col2")}, new Column[] {df.col("col1")});
    Parameters:
    
    func - An object that represents a user-defined table function (UDTF).
    
    args - An array of arguments to pass to the specified table function.
    
    partitionBy - An array of columns partitioned by.
    
    orderBy - An array of columns ordered by.
    
    Returns:
    
    The result DataFrame
    
    Since:
    
    1.7.0
  - join
```
public DataFrame join(TableFunction func,
                      Map<String,Column> args)
```
    Joins the current DataFrame with the output of the specified table function `func` that takes named parameters (e.g. `flatten`).
    To pass arguments to the table function, use the `args` argument of this method. Pass in a `Map` of parameter names and values. In these values, you can include references to columns in this DataFrame.
    For example:
    Map<String, Column> args = new HashMap<>(); args.put("input", Functions.parse_json(df.col("a"))); df.join(new TableFunction("flatten"), args);
    Parameters:
    
    func - TableFunction object, which can be one of the values in the TableFunctions class or an object that you create from the TableFunction class.
    
    args - Map of arguments to pass to the specified table function. Some functions, like `flatten`, have named parameters. Use this map to specify the parameter names and their corresponding values.
    
    Returns:
    
    The result DataFrame
    
    Since:
    
    1.2.0
  - join
```
public DataFrame join(TableFunction func,
                      Map<String,Column> args,
                      Column[] partitionBy,
                      Column[] orderBy)
```
    Joins the current DataFrame with the output of the specified user-defined table function (UDTF) `func`.
    To pass arguments to the table function, use the `args` argument of this method. Pass in a `Map` of parameter names and values. In these values, you can include references to columns in this DataFrame.
    To specify a PARTITION BY or ORDER BY clause, use the `partitionBy` and `orderBy` arguments.
    For example:
    // The following example passes the values in the column `col1` to the // user-defined tabular function (UDTF) `udtf`, partitioning the // data by `col2` and sorting the data by `col1`. The example returns // a new DataFrame that joins the contents of the current DataFrame with // the output of the UDTF. Map<String, Column> args = new HashMap<>(); args.put("arg1", df.col("col1")); df.join( args, new Column[] {df.col("col2")}, new Column[] {df.col("col1")} )
    Parameters:
    
    func - An object that represents a user-defined table function (UDTF).
    
    args - Map of arguments to pass to the specified table function. Some functions, like `flatten`, have named parameters. Use this map to specify the parameter names and their corresponding values.
    
    partitionBy - An array of columns partitioned by.
    
    orderBy - An array of columns ordered by.
    
    Returns:
    
    The result DataFrame
    
    Since:
    
    1.7.0
  - join
```
public DataFrame join(Column func)
```
    Joins the current DataFrame with the output of the specified table function `func`.
    Pre-defined table functions can be found in `TableFunctions` class.
    For example:
    df.join(TableFunctions.flatten( Functions.parse_json(df.col("col")), "path", true, true, "both" ));
    
    Or load any Snowflake builtin table function via TableFunction Class.
    Map<String, Column> args = new HashMap<>(); args.put("input", Functions.parse_json(df.col("a"))); df.join(new TableFunction("flatten").call(args));
    Parameters:
    
    func - Column object, which can be one of the values in the TableFunctions class or an object that you create from the `new TableFunction("name").call()`.
    
    Returns:
    
    The result DataFrame
    
    Since:
    
    1.10.0
  - join
```
public DataFrame join(Column func,
                      Column[] partitionBy,
                      Column[] orderBy)
```
    Joins the current DataFrame with the output of the specified table function `func`.
    To specify a PARTITION BY or ORDER BY clause, use the `partitionBy` and `orderBy` arguments.
    Pre-defined table functions can be found in `TableFunctions` class.
    For example:
    df.join(TableFunctions.flatten( Functions.parse_json(df.col("col1")), "path", true, true, "both" ), new Column[] {df.col("col2")}, new Column[] {df.col("col1")} );
    
    Or load any Snowflake builtin table function via TableFunction Class.
    Map<String, Column> args = new HashMap<>(); args.put("input", Functions.parse_json(df.col("col1"))); df.join(new TableFunction("flatten").call(args), new Column[] {df.col("col2")}, new Column[] {df.col("col1")});
    Parameters:
    
    func - Column object, which can be one of the values in the TableFunctions class or an object that you create from the `new TableFunction("name").call()`.
    
    partitionBy - An array of columns partitioned by.
    
    orderBy - An array of columns ordered by.
    
    Returns:
    
    The result DataFrame
    
    Since:
    
    1.10.0

Class DataFrame

Field Summary

Fields inherited from class com.snowflake.snowpark.internal.Logging

Method Summary

Methods inherited from class com.snowflake.snowpark.internal.Logging

Methods inherited from class java.lang.Object

Method Detail

schema

cacheResult

explain

toDF

withColumn

withColumns

rename

select

select

drop

drop

filter

where

agg

distinct

dropDuplicates

union

unionAll

unionByName

unionAllByName

intersect

except

clone

join

join

join

join

join

join

crossJoin

naturalJoin

naturalJoin

sort

limit

groupBy

groupBy

rollup

rollup

cube

cube

groupByGroupingSets

pivot

pivot

count

col

alias

collect

toLocalIterator

show

show

show

createOrReplaceView

createOrReplaceView

createOrReplaceTempView

createOrReplaceTempView

first

first

sample

sample

randomSplit

flatten

flatten

write

na

stat

async

join

join

join

join

join

join