class DataFrame extends Logging

Represents a lazily-evaluated relational dataset that contains a collection of Row objects with columns defined by a schema (column name and type).

A DataFrame is considered lazy because it encapsulates the computation or query required to produce a relational dataset. The computation is not performed until you call a method that performs an action (e.g. collect ).

Creating a DataFrame

You can create a DataFrame in a number of different ways, as shown in the examples below.

Example 1: Creating a DataFrame by reading a table.

val dfPrices = session.table("itemsdb.publicschema.prices")

Example 2: Creating a DataFrame by reading files from a stage.

val dfCatalog = session.read.csv("@stage/some_dir")

Example 3: Creating a DataFrame by specifying a sequence or a range.

val df = session.createDataFrame(Seq((1, "one"), (2, "two")))
val df = session.range(1, 10, 2)

Example 4: Create a new DataFrame by applying transformations to other existing DataFrames.

val dfMergedData = dfCatalog.join(dfPrices, dfCatalog("itemId") === dfPrices("ID"))

Performing operations on a DataFrame

Broadly, the operations on DataFrame can be divided into two types:

  • Transformations produce a new DataFrame from one or more existing DataFrames. Note that tranformations are lazy and don't cause the DataFrame to be evaluated. If the API does not provide a method to express the SQL that you want to use, you can use functions.sqlExpr as a workaround.
  • Actions cause the DataFrame to be evaluated. When you call a method that performs an action, Snowpark sends the SQL query for the DataFrame to the server for evaluation.

Transforming a DataFrame

The following examples demonstrate how you can transform a DataFrame.

Example 5. Using the select method to select the columns that should be in the DataFrame (similar to adding a SELECT clause).

// Return a new DataFrame containing the ID and amount columns of the prices table. This is
// equivalent to:
//   SELECT ID, AMOUNT FROM PRICES;
val dfPriceIdsAndAmounts = dfPrices.select(col("ID"), col("amount"))

Example 6. Using the Column.as method to rename a column in a DataFrame (similar to using SELECT col AS alias ).

// Return a new DataFrame containing the ID column of the prices table as a column named
// itemId. This is equivalent to:
//   SELECT ID AS itemId FROM PRICES;
val dfPriceItemIds = dfPrices.select(col("ID").as("itemId"))

Example 7. Using the filter method to filter data (similar to adding a WHERE clause).

// Return a new DataFrame containing the row from the prices table with the ID 1. This is
// equivalent to:
//   SELECT * FROM PRICES WHERE ID = 1;
val dfPrice1 = dfPrices.filter((col("ID") === 1))

Example 8. Using the sort method to specify the sort order of the data (similar to adding an ORDER BY clause).

// Return a new DataFrame for the prices table with the rows sorted by ID. This is equivalent
// to:
//   SELECT * FROM PRICES ORDER BY ID;
val dfSortedPrices = dfPrices.sort(col("ID"))

Example 9. Using the groupBy method to return a RelationalGroupedDataFrame that you can use to group and aggregate results (similar to adding a GROUP BY clause).

RelationalGroupedDataFrame provides methods for aggregating results, including:

  • avg (equivalent to AVG(column))
  • count (equivalent to COUNT())
  • max (equivalent to MAX(column))
  • median (equivalent to MEDIAN(column))
  • min (equivalent to MIN(column))
  • sum (equivalent to SUM(column))
// Return a new DataFrame for the prices table that computes the sum of the prices by
// category. This is equivalent to:
//   SELECT CATEGORY, SUM(AMOUNT) FROM PRICES GROUP BY CATEGORY;
val dfTotalPricePerCategory = dfPrices.groupBy(col("category")).sum(col("amount"))

Example 10. Using a Window to build a WindowSpec object that you can use for windowing functions (similar to using '<function> OVER ... PARTITION BY ... ORDER BY').

// Define a window that partitions prices by category and sorts the prices by date within the
// partition.
val window = Window.partitionBy(col("category")).orderBy(col("price_date"))
// Calculate the running sum of prices over this window. This is equivalent to:
//   SELECT CATEGORY, PRICE_DATE, SUM(AMOUNT) OVER
//       (PARTITION BY CATEGORY ORDER BY PRICE_DATE)
//       FROM PRICES ORDER BY PRICE_DATE;
val dfCumulativePrices = dfPrices.select(
    col("category"), col("price_date"),
    sum(col("amount")).over(window)).sort(col("price_date"))

Performing an action on a DataFrame

The following examples demonstrate how you can perform an action on a DataFrame.

Example 11: Performing a query and returning an array of Rows.

val results = dfPrices.collect()

Example 12: Performing a query and print the results.

dfPrices.show()
Since

0.1.0

Linear Supertypes
Logging , AnyRef , Any
Ordering
  1. Grouped
  2. Alphabetic
  3. By Inheritance
Inherited
  1. DataFrame
  2. Logging
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def != ( arg0: Any ) : Boolean
    Definition Classes
    AnyRef → Any
  2. final def ## () : Int
    Definition Classes
    AnyRef → Any
  3. final def == ( arg0: Any ) : Boolean
    Definition Classes
    AnyRef → Any
  4. def action [ T ] ( funcName: String ) ( func: ⇒ T ) : T
    Attributes
    protected
    Annotations
    @inline ()
  5. def agg ( exprs: Array [ Column ] ) : DataFrame

    Aggregate the data in the DataFrame.

    Aggregate the data in the DataFrame. Use this method if you don't need to group the data ( groupBy ).

    For the input value, pass in expressions that apply aggregation functions to columns (functions that are defined in the functions object).

    The following example calculates the maximum value of the num_sales column and the mean value of the price column:

    For example:

    import com.snowflake.snowpark.functions._
    
    val dfAgg = df.agg(Array(max($"num_sales"), mean($"price")))
    exprs

    An array of expressions on columns.

    returns

    A DataFrame

    Since

    0.7.0

  6. def agg [ T ] ( exprs: Seq [ Column ] ) ( implicit arg0: ClassTag [ T ] ) : DataFrame

    Aggregate the data in the DataFrame.

    Aggregate the data in the DataFrame. Use this method if you don't need to group the data ( groupBy ).

    For the input value, pass in expressions that apply aggregation functions to columns (functions that are defined in the functions object).

    The following example calculates the maximum value of the num_sales column and the mean value of the price column:

    import com.snowflake.snowpark.functions._
    
    val dfAgg = df.agg(Seq(max($"num_sales"), mean($"price")))
    exprs

    A list of expressions on columns.

    returns

    A DataFrame

    Since

    0.2.0

  7. def agg ( expr: Column , exprs: Column * ) : DataFrame

    Aggregate the data in the DataFrame.

    Aggregate the data in the DataFrame. Use this method if you don't need to group the data ( groupBy ).

    For the input value, pass in expressions that apply aggregation functions to columns (functions that are defined in the functions object).

    The following example calculates the maximum value of the num_sales column and the mean value of the price column:

    For example:

    import com.snowflake.snowpark.functions._
    
    val dfAgg = df.agg(max($"num_sales"), mean($"price"))
    expr

    A list of expressions on columns.

    returns

    A DataFrame

    Since

    0.1.0

  8. def agg ( exprs: Seq [( String , String )] ) : DataFrame

    Aggregate the data in the DataFrame.

    Aggregate the data in the DataFrame. Use this method if you don't need to group the data ( groupBy ).

    For the input, pass in a Map that specifies the column names and aggregation functions. For each pair in the Map:

    • Set the key to the name of the column to aggregate.
    • Set the value to the name of the aggregation function to use on that column.

    The following example calculates the maximum value of the num_sales column and the average value of the price column:

    val dfAgg = df.agg(Seq("num_sales" -> "max", "price" -> "mean"))

    This is equivalent to calling agg after calling groupBy without a column name:

    val dfAgg = df.groupBy().agg(Seq(df("num_sales") -> "max", df("price") -> "mean"))
    exprs

    A map of column names and aggregate functions.

    returns

    A DataFrame

    Since

    0.2.0

  9. def agg ( expr: ( String , String ) , exprs: ( String , String )* ) : DataFrame

    Aggregate the data in the DataFrame.

    Aggregate the data in the DataFrame. Use this method if you don't need to group the data ( groupBy ).

    For the input, pass in a Map that specifies the column names and aggregation functions. For each pair in the Map:

    • Set the key to the name of the column to aggregate.
    • Set the value to the name of the aggregation function to use on that column.

    The following example calculates the maximum value of the num_sales column and the average value of the price column:

    val dfAgg = df.agg("num_sales" -> "max", "price" -> "mean")

    This is equivalent to calling agg after calling groupBy without a column name:

    val dfAgg = df.groupBy().agg(df("num_sales") -> "max", df("price") -> "mean")
    expr

    A map of column names and aggregate functions.

    returns

    A DataFrame

    Since

    0.1.0

  10. def alias ( alias: String ) : DataFrame

    Returns the current DataFrame aliased as the input alias name.

    Returns the current DataFrame aliased as the input alias name.

    For example:

    val df2 = df.alias("A")
    df2.select(df2.col("A.num"))
    alias

    The alias name of the dataframe

    returns

    a DataFrame

    Since

    1.10.0

  11. def apply ( colName: String ) : Column

    Returns a reference to a column in the DataFrame.

    Returns a reference to a column in the DataFrame. This method is identical to DataFrame.col .

    colName

    The name of the column.

    returns

    A Column

    Since

    0.1.0

  12. final def asInstanceOf [ T0 ] : T0
    Definition Classes
    Any
  13. def async : DataFrameAsyncActor

    Returns a DataFrameAsyncActor object that can be used to execute DataFrame actions asynchronously.

    Returns a DataFrameAsyncActor object that can be used to execute DataFrame actions asynchronously.

    Example:

    val asyncJob = df.async.collect()
    // At this point, the thread is not blocked. You can perform additional work before
    // calling asyncJob.getResult() to retrieve the results of the action.
    // NOTE: getResult() is a blocking call.
    val rows = asyncJob.getResult()
    returns

    A DataFrameAsyncActor object

    Since

    0.11.0

  14. def cacheResult () : HasCachedResult

    Caches the content of this DataFrame to create a new cached DataFrame.

    Caches the content of this DataFrame to create a new cached DataFrame.

    All subsequent operations on the returned cached DataFrame are performed on the cached data and have no effect on the original DataFrame.

    returns

    A HasCachedResult

    Since

    0.4.0

  15. def clone () : DataFrame

    Returns a clone of this DataFrame.

    Returns a clone of this DataFrame.

    returns

    A DataFrame

    Definition Classes
    DataFrame → AnyRef
    Since

    0.4.0

  16. def col ( colName: String ) : Column

    Returns a reference to a column in the DataFrame.

    Returns a reference to a column in the DataFrame.

    colName

    The name of the column.

    returns

    A Column

    Since

    0.1.0

  17. def collect () : Array [ Row ]

    Executes the query representing this DataFrame and returns the result as an Array of Row objects.

    Executes the query representing this DataFrame and returns the result as an Array of Row objects.

    returns

    An Array of Row

    Since

    0.1.0

  18. def count () : Long

    Executes the query representing this DataFrame and returns the number of rows in the result (similar to the COUNT function in SQL).

    Executes the query representing this DataFrame and returns the number of rows in the result (similar to the COUNT function in SQL).

    returns

    The number of rows.

    Since

    0.1.0

  19. def createOrReplaceTempView ( multipartIdentifier: List [ String ] ) : Unit

    Creates a temporary view that returns the same results as this DataFrame.

    Creates a temporary view that returns the same results as this DataFrame.

    You can use the view in subsequent SQL queries and statements during the current session. The temporary view is only available in the session in which it is created.

    In multipartIdentifer , you can include the database and schema name to specify a fully-qualified name. If no database name or schema name are specified, the view will be created in the current database or schema.

    The view name must be a valid Snowflake identifier .

    multipartIdentifier

    A list of strings that specify the database name, schema name, and view name.

    Since

    0.5.0

  20. def createOrReplaceTempView ( multipartIdentifier: Seq [ String ] ) : Unit

    Creates a temporary view that returns the same results as this DataFrame.

    Creates a temporary view that returns the same results as this DataFrame.

    You can use the view in subsequent SQL queries and statements during the current session. The temporary view is only available in the session in which it is created.

    In multipartIdentifer , you can include the database and schema name to specify a fully-qualified name. If no database name or schema name are specified, the view will be created in the current database or schema.

    The view name must be a valid Snowflake identifier .

    multipartIdentifier

    A sequence of strings that specify the database name, schema name, and view name.

    Since

    0.5.0

  21. def createOrReplaceTempView ( viewName: String ) : Unit

    Creates a temporary view that returns the same results as this DataFrame.

    Creates a temporary view that returns the same results as this DataFrame.

    You can use the view in subsequent SQL queries and statements during the current session. The temporary view is only available in the session in which it is created.

    For viewName , you can include the database and schema name (i.e. specify a fully-qualified name). If no database name or schema name are specified, the view will be created in the current database or schema.

    viewName must be a valid Snowflake identifier .

    viewName

    The name of the view to create or replace.

    Since

    0.4.0

  22. def createOrReplaceView ( multipartIdentifier: List [ String ] ) : Unit

    Creates a view that captures the computation expressed by this DataFrame.

    Creates a view that captures the computation expressed by this DataFrame.

    In multipartIdentifer , you can include the database and schema name to specify a fully-qualified name. If no database name or schema name are specified, the view will be created in the current database or schema.

    The view name must be a valid Snowflake identifier .

    multipartIdentifier

    A list of strings that specifies the database name, schema name, and view name.

    Since

    0.5.0

  23. def createOrReplaceView ( multipartIdentifier: Seq [ String ] ) : Unit

    Creates a view that captures the computation expressed by this DataFrame.

    Creates a view that captures the computation expressed by this DataFrame.

    In multipartIdentifer , you can include the database and schema name to specify a fully-qualified name. If no database name or schema name are specified, the view will be created in the current database or schema.

    The view name must be a valid Snowflake identifier .

    multipartIdentifier

    A sequence of strings that specifies the database name, schema name, and view name.

    Since

    0.5.0

  24. def createOrReplaceView ( viewName: String ) : Unit

    Creates a view that captures the computation expressed by this DataFrame.

    Creates a view that captures the computation expressed by this DataFrame.

    For viewName , you can include the database and schema name (i.e. specify a fully-qualified name). If no database name or schema name are specified, the view will be created in the current database or schema.

    viewName must be a valid Snowflake identifier .

    viewName

    The name of the view to create or replace.

    Since

    0.1.0

  25. def crossJoin ( right: DataFrame ) : DataFrame

    Performs a cross join, which returns the cartesian product of the current DataFrame and another DataFrame ( right ).

    Performs a cross join, which returns the cartesian product of the current DataFrame and another DataFrame ( right ).

    If the current and right DataFrames have columns with the same name, and you need to refer to one of these columns in the returned DataFrame, use the apply or col function on the current or right DataFrame to disambiguate references to these columns.

    For example:

    val dfCrossJoin = left.crossJoin(right)
    val project = dfCrossJoin.select(left("common_col") + right("common_col"))
    right

    The other DataFrame to join.

    returns

    A DataFrame

    Since

    0.1.0

  26. def cube ( cols: Seq [ String ] ) : RelationalGroupedDataFrame

    Performs an SQL GROUP BY CUBE

    Performs an SQL GROUP BY CUBE

    cols

    A list of the names of columns to use.

    returns

    A RelationalGroupedDataFrame

    Since

    0.2.0

  27. def cube ( first: String , remaining: String * ) : RelationalGroupedDataFrame

    Performs an SQL GROUP BY CUBE on the DataFrame.

    Performs an SQL GROUP BY CUBE on the DataFrame.

    first

    The name of the first column to use.

    remaining

    A list of the names of additional columns to use.

    returns

    A RelationalGroupedDataFrame

    Since

    0.1.0

  28. def cube ( cols: Array [ Column ] ) : RelationalGroupedDataFrame

    Performs an SQL GROUP BY CUBE on the DataFrame.

    Performs an SQL GROUP BY CUBE on the DataFrame.

    cols

    A list of expressions for columns to use.

    returns

    A RelationalGroupedDataFrame

    Since

    0.9.0

  29. def cube [ T ] ( cols: Seq [ Column ] ) ( implicit arg0: ClassTag [ T ] ) : RelationalGroupedDataFrame

    Performs an SQL GROUP BY CUBE on the DataFrame.

    Performs an SQL GROUP BY CUBE on the DataFrame.

    cols

    A list of expressions for columns to use.

    returns

    A RelationalGroupedDataFrame

    Since

    0.2.0

  30. def cube ( first: Column , remaining: Column * ) : RelationalGroupedDataFrame

    Performs an SQL GROUP BY CUBE on the DataFrame.

    Performs an SQL GROUP BY CUBE on the DataFrame.

    first

    The expression for the first column to use.

    remaining

    A list of expressions for additional columns to use.

    returns

    A RelationalGroupedDataFrame

    Since

    0.1.0

  31. def disambiguate ( lhs: DataFrame , rhs: DataFrame , joinType: JoinType , usingColumns: Seq [ String ] ) : ( DataFrame , DataFrame )
    Attributes
    protected
  32. def distinct () : DataFrame

    Returns a new DataFrame that contains only the rows with distinct values from the current DataFrame.

    Returns a new DataFrame that contains only the rows with distinct values from the current DataFrame.

    This is equivalent to performing a SELECT DISTINCT in SQL.

    returns

    A DataFrame

    Since

    0.1.0

  33. def drop ( cols: Array [ Column ] ) : DataFrame

    Returns a new DataFrame that excludes the specified column expressions from the output.

    Returns a new DataFrame that excludes the specified column expressions from the output.

    This is functionally equivalent to calling select and passing in all columns except the ones to exclude.

    This method throws a SnowparkClientException if:

    • A specified column does not have a name, or
    • The resulting DataFrame has no output columns.
    cols

    An array of the names of the columns to exclude.

    returns

    A DataFrame

    Since

    0.7.0

  34. def drop [ T ] ( cols: Seq [ Column ] ) ( implicit arg0: ClassTag [ T ] ) : DataFrame

    Returns a new DataFrame that excludes the specified column expressions from the output.

    Returns a new DataFrame that excludes the specified column expressions from the output.

    This is functionally equivalent to calling select and passing in all columns except the ones to exclude.

    This method throws a SnowparkClientException if:

    • A specified column does not have a name, or
    • The resulting DataFrame has no output columns.
    cols

    A list of the names of the columns to exclude.

    returns

    A DataFrame

    Since

    0.2.0

  35. def drop ( first: Column , remaining: Column * ) : DataFrame

    Returns a new DataFrame that excludes the columns specified by the expressions from the output.

    Returns a new DataFrame that excludes the columns specified by the expressions from the output.

    This is functionally equivalent to calling select and passing in all columns except the ones to exclude.

    This method throws a SnowparkClientException if:

    • A specified column does not have a name, or
    • The resulting DataFrame has no output columns.
    first

    The expression for the first column to exclude.

    remaining

    A list of expressions for additional columns to exclude.

    returns

    A DataFrame

    Since

    0.1.0

  36. def drop ( colNames: Array [ String ] ) : DataFrame

    Returns a new DataFrame that excludes the columns with the specified names from the output.

    Returns a new DataFrame that excludes the columns with the specified names from the output.

    This is functionally equivalent to calling select and passing in all columns except the ones to exclude.

    Throws SnowparkClientException if the resulting DataFrame contains no output columns.

    colNames

    An array of the names of columns to exclude.

    returns

    A DataFrame

    Since

    0.7.0

  37. def drop ( colNames: Seq [ String ] ) : DataFrame

    Returns a new DataFrame that excludes the columns with the specified names from the output.

    Returns a new DataFrame that excludes the columns with the specified names from the output.

    This is functionally equivalent to calling select and passing in all columns except the ones to exclude.

    Throws SnowparkClientException if the resulting DataFrame contains no output columns.

    colNames

    A list of the names of columns to exclude.

    returns

    A DataFrame

    Since

    0.2.0

  38. def drop ( first: String , remaining: String * ) : DataFrame

    Returns a new DataFrame that excludes the columns with the specified names from the output.

    Returns a new DataFrame that excludes the columns with the specified names from the output.

    This is functionally equivalent to calling select and passing in all columns except the ones to exclude.

    Throws SnowparkClientException if the resulting DataFrame contains no output columns.

    first

    The name of the first column to exclude.

    remaining

    A list of the names of additional columns to exclude.

    returns

    A DataFrame

    Since

    0.1.0

  39. def dropDuplicates ( colNames: String * ) : DataFrame

    Creates a new DataFrame by removing duplicated rows on given subset of columns.

    Creates a new DataFrame by removing duplicated rows on given subset of columns. If no subset of columns specified, this function is same as distinct() function. The result is non-deterministic when removing duplicated rows from the subset of columns but not all columns. For example: Supposes we have a DataFrame df , which contains three rows (a, b, c): (1, 1, 1), (1, 1, 2), (1, 2, 3) The result of df.dropDuplicates("a", "b") can be either (1, 1, 1), (1, 2, 3) or (1, 1, 2), (1, 2, 3)

    returns

    A DataFrame

    Since

    0.10.0

  40. final def eq ( arg0: AnyRef ) : Boolean
    Definition Classes
    AnyRef
  41. def equals ( arg0: Any ) : Boolean
    Definition Classes
    AnyRef → Any
  42. def except ( other: DataFrame ) : DataFrame

    Returns a new DataFrame that contains all the rows from the current DataFrame except for the rows that also appear in another DataFrame ( other ).

    Returns a new DataFrame that contains all the rows from the current DataFrame except for the rows that also appear in another DataFrame ( other ). Duplicate rows are eliminated.

    For example:

    val df1except2 = df1.except(df2)
    other

    The DataFrame that contains the rows to exclude.

    returns

    A DataFrame

    Since

    0.1.0

  43. def explain () : Unit

    Prints the list of queries that will be executed to evaluate this DataFrame.

    Prints the list of queries that will be executed to evaluate this DataFrame. Prints the query execution plan if only one SELECT/DML/DDL statement will be executed.

    For more information about the query execution plan, see the EXPLAIN command.

    Since

    0.1.0

  44. def filter ( condition: Column ) : DataFrame

    Filters rows based on the specified conditional expression (similar to WHERE in SQL).

    Filters rows based on the specified conditional expression (similar to WHERE in SQL).

    For example:

    val dfFiltered = df.filter($"colA" > 1 && $"colB" < 100)
    condition

    Filter condition defined as an expression on columns.

    returns

    A filtered DataFrame

    Since

    0.1.0

  45. def first ( n: Int ) : Array [ Row ]

    Executes the query representing this DataFrame and returns the first n rows of the results.

    Executes the query representing this DataFrame and returns the first n rows of the results.

    n

    The number of rows to return.

    returns

    An Array of the first n Row objects. If n is negative or larger than the number of rows in the results, returns all rows in the results.

    Since

    0.2.0

  46. def first () : Option [ Row ]

    Executes the query representing this DataFrame and returns the first row of results.

    Executes the query representing this DataFrame and returns the first row of results.

    returns

    The first Row , if the row exists. Otherwise, returns None .

    Since

    0.2.0

  47. def flatten ( input: Column , path: String , outer: Boolean , recursive: Boolean , mode: String ) : DataFrame

    Flattens (explodes) compound values into multiple rows (similar to the SQL FLATTEN function).

    Flattens (explodes) compound values into multiple rows (similar to the SQL FLATTEN function).

    The flatten method adds the following columns to the returned DataFrame:

    • SEQ
    • KEY
    • PATH
    • INDEX
    • VALUE
    • THIS

    If this DataFrame also has columns with the names above, you can disambiguate the columns by using the this("value") syntax.

    For example, if the current DataFrame has a column named value :

    val table1 = session.sql(
      "select parse_json(value) as value from values('[1,2]') as T(value)")
    val flattened = table1.flatten(table1("value"), "", outer = false,
      recursive = false, "both")
    flattened.select(table1("value"), flattened("value").as("newValue")).show()
    input

    The expression that will be unseated into rows. The expression must be of data type VARIANT, OBJECT, or ARRAY.

    path

    The path to the element within a VARIANT data structure which needs to be flattened. Can be a zero-length string (i.e. empty path) if the outermost element is to be flattened.

    outer

    If FALSE, any input rows that cannot be expanded, either because they cannot be accessed in the path or because they have zero fields or entries, are completely omitted from the output. Otherwise, exactly one row is generated for zero-row expansions (with NULL in the KEY, INDEX, and VALUE columns).

    recursive

    If FALSE, only the element referenced by PATH is expanded. Otherwise, the expansion is performed for all sub-elements recursively.

    mode

    Specifies whether only OBJECT, ARRAY, or BOTH should be flattened.

    returns

    A DataFrame containing the flattened values.

    Since

    0.2.0

  48. def flatten ( input: Column ) : DataFrame

    Flattens (explodes) compound values into multiple rows (similar to the SQL FLATTEN function).

    Flattens (explodes) compound values into multiple rows (similar to the SQL FLATTEN function).

    The flatten method adds the following columns to the returned DataFrame:

    • SEQ
    • KEY
    • PATH
    • INDEX
    • VALUE
    • THIS

    If this DataFrame also has columns with the names above, you can disambiguate the columns by using the this("value") syntax.

    For example, if the current DataFrame has a column named value :

    val table1 = session.sql(
      "select parse_json(value) as value from values('[1,2]') as T(value)")
    val flattened = table1.flatten(table1("value"))
    flattened.select(table1("value"), flattened("value").as("newValue")).show()
    input

    The expression that will be unseated into rows. The expression must be of data type VARIANT, OBJECT, or ARRAY.

    returns

    A DataFrame containing the flattened values.

    Since

    0.2.0

  49. final def getClass () : Class [_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native () @HotSpotIntrinsicCandidate ()
  50. def groupBy ( cols: Array [ String ] ) : RelationalGroupedDataFrame

    Groups rows by the columns specified by name (similar to GROUP BY in SQL).

    Groups rows by the columns specified by name (similar to GROUP BY in SQL).

    This method returns a RelationalGroupedDataFrame that you can use to perform aggregations on each group of data.

    cols

    An array of the names of columns to group by.

    returns

    A RelationalGroupedDataFrame

    Since

    0.7.0

  51. def groupBy ( cols: Seq [ String ] ) : RelationalGroupedDataFrame

    Groups rows by the columns specified by name (similar to GROUP BY in SQL).

    Groups rows by the columns specified by name (similar to GROUP BY in SQL).

    This method returns a RelationalGroupedDataFrame that you can use to perform aggregations on each group of data.

    cols

    A list of the names of columns to group by.

    returns

    A RelationalGroupedDataFrame

    Since

    0.2.0

  52. def groupBy ( first: String , remaining: String * ) : RelationalGroupedDataFrame

    Groups rows by the columns specified by name (similar to GROUP BY in SQL).

    Groups rows by the columns specified by name (similar to GROUP BY in SQL).

    This method returns a RelationalGroupedDataFrame that you can use to perform aggregations on each group of data.

    first

    The name of the first column to group by.

    remaining

    A list of the names of additional columns to group by.

    returns

    A RelationalGroupedDataFrame

    Since

    0.1.0

  53. def groupBy ( cols: Array [ Column ] ) : RelationalGroupedDataFrame

    Groups rows by the columns specified by expressions (similar to GROUP BY in SQL).

    Groups rows by the columns specified by expressions (similar to GROUP BY in SQL).

    This method returns a RelationalGroupedDataFrame that you can use to perform aggregations on each group of data.

    cols

    An array of expressions on columns.

    returns

    A RelationalGroupedDataFrame

    Since

    0.7.0

  54. def groupBy [ T ] ( cols: Seq [ Column ] ) ( implicit arg0: ClassTag [ T ] ) : RelationalGroupedDataFrame

    Groups rows by the columns specified by expressions (similar to GROUP BY in SQL).

    Groups rows by the columns specified by expressions (similar to GROUP BY in SQL).

    This method returns a RelationalGroupedDataFrame that you can use to perform aggregations on each group of data.

    cols

    A list of expressions on columns.

    returns

    A RelationalGroupedDataFrame

    Since

    0.2.0

  55. def groupBy () : RelationalGroupedDataFrame

    Returns a RelationalGroupedDataFrame that you can use to perform aggregations on the underlying DataFrame.

    Returns a RelationalGroupedDataFrame that you can use to perform aggregations on the underlying DataFrame.

    returns

    A RelationalGroupedDataFrame

    Since

    0.1.0

  56. def groupBy ( first: Column , remaining: Column * ) : RelationalGroupedDataFrame

    Groups rows by the columns specified by expressions (similar to GROUP BY in SQL).

    Groups rows by the columns specified by expressions (similar to GROUP BY in SQL).

    This method returns a RelationalGroupedDataFrame that you can use to perform aggregations on each group of data.

    first

    The expression for the first column to group by.

    remaining

    A list of expressions for additional columns to group by.

    returns

    A RelationalGroupedDataFrame

    Since

    0.1.0

  57. def groupByGroupingSets ( groupingSets: Seq [ GroupingSets ] ) : RelationalGroupedDataFrame

    Performs an SQL GROUP BY GROUPING SETS on the DataFrame.

    Performs an SQL GROUP BY GROUPING SETS on the DataFrame.

    GROUP BY GROUPING SETS is an extension of the GROUP BY clause that allows computing multiple group-by clauses in a single statement. The group set is a set of dimension columns.

    GROUP BY GROUPING SETS is equivalent to the UNION of two or more GROUP BY operations in the same result set:

    df.groupByGroupingSets(GroupingSets(Set(col("a")))) is equivalent to df.groupBy("a")

    and

    df.groupByGroupingSets(GroupingSets(Set(col("a")), Set(col("b")))) is equivalent to df.groupBy("a") union df.groupBy("b")

    groupingSets

    A list of GroupingSets objects.

    Since

    0.4.0

  58. def groupByGroupingSets ( first: GroupingSets , remaining: GroupingSets * ) : RelationalGroupedDataFrame

    Performs an SQL GROUP BY GROUPING SETS on the DataFrame.

    Performs an SQL GROUP BY GROUPING SETS on the DataFrame.

    GROUP BY GROUPING SETS is an extension of the GROUP BY clause that allows computing multiple GROUP BY clauses in a single statement. The group set is a set of dimension columns.

    GROUP BY GROUPING SETS is equivalent to the UNION of two or more GROUP BY operations in the same result set:

    df.groupByGroupingSets(GroupingSets(Set(col("a")))) is equivalent to df.groupBy("a")

    and

    df.groupByGroupingSets(GroupingSets(Set(col("a")), Set(col("b")))) is equivalent to df.groupBy("a") union df.groupBy("b")

    first

    A GroupingSets object.

    remaining

    A list of additional GroupingSets objects.

    Since

    0.4.0

  59. def hashCode () : Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native () @HotSpotIntrinsicCandidate ()
  60. def intersect ( other: DataFrame ) : DataFrame

    Returns a new DataFrame that contains the intersection of rows from the current DataFrame and another DataFrame ( other ).

    Returns a new DataFrame that contains the intersection of rows from the current DataFrame and another DataFrame ( other ). Duplicate rows are eliminated.

    For example:

    val dfIntersectionOf1and2 = df1.intersect(df2)
    other

    The other DataFrame that contains the rows to use for the intersection.

    returns

    A DataFrame

    Since

    0.1.0

  61. final def isInstanceOf [ T0 ] : Boolean
    Definition Classes
    Any
  62. def join ( func: Column , partitionBy: Seq [ Column ] , orderBy: Seq [ Column ] ) : DataFrame

    Joins the current DataFrame with the output of the specified user-defined table function (UDTF) func .

    Joins the current DataFrame with the output of the specified user-defined table function (UDTF) func .

    To specify a PARTITION BY or ORDER BY clause, use the partitionBy and orderBy arguments.

    For example:

    val tf = session.udtf.registerTemporary(TableFunc1)
    df.join(tf(Map("arg1" -> df("col1")),Seq(df("col2")), Seq(df("col1"))))
    func

    TableFunction object that represents a user-defined table function.

    partitionBy

    A list of columns partitioned by.

    orderBy

    A list of columns ordered by.

    Since

    1.10.0

  63. def join ( func: Column ) : DataFrame

    Joins the current DataFrame with the output of the specified table function func .

    Joins the current DataFrame with the output of the specified table function func .

    For example:

    // The following example uses the flatten function to explode compound values from
    // column 'a' in this DataFrame into multiple columns.
    
    import com.snowflake.snowpark.functions._
    import com.snowflake.snowpark.tableFunctions._
    
    df.join(
      tableFunctions.flatten(parse_json(df("a")))
    )
    func

    TableFunction object, which can be one of the values in the tableFunctions object or an object that you create from the TableFunction.apply() .

    Since

    1.10.0

  64. def join ( func: TableFunction , args: Map [ String , Column ] , partitionBy: Seq [ Column ] , orderBy: Seq [ Column ] ) : DataFrame

    Joins the current DataFrame with the output of the specified user-defined table function (UDTF) func .

    Joins the current DataFrame with the output of the specified user-defined table function (UDTF) func .

    To pass arguments to the table function, use the args argument of this method. Pass in a Map of parameter names and values. In these values, you can include references to columns in this DataFrame.

    To specify a PARTITION BY or ORDER BY clause, use the partitionBy and orderBy arguments.

    For example:

    // The following example passes the values in the column `col1` to the
    // user-defined tabular function (UDTF) `udtf`, partitioning the
    // data by `col2` and sorting the data by `col1`. The example returns
    // a new DataFrame that joins the contents of the current DataFrame with
    // the output of the UDTF.
    df.join(
      tableFunction("udtf"),
      Map("arg1" -> df("col1"),
      Seq(df("col2")), Seq(df("col1")))
    )
    func

    TableFunction object that represents a user-defined table function (UDTF).

    args

    Map of arguments to pass to the specified table function. Some functions, like flatten , have named parameters. Use this map to specify the parameter names and their corresponding values.

    partitionBy

    A list of columns partitioned by.

    orderBy

    A list of columns ordered by.

    Since

    1.7.0

  65. def join ( func: TableFunction , args: Map [ String , Column ] ) : DataFrame

    Joins the current DataFrame with the output of the specified table function func that takes named parameters (e.g.

    Joins the current DataFrame with the output of the specified table function func that takes named parameters (e.g. flatten ).

    To pass arguments to the table function, use the args argument of this method. Pass in a Map of parameter names and values. In these values, you can include references to columns in this DataFrame.

    For example:

    // The following example uses the flatten function to explode compound values from
    // column 'a' in this DataFrame into multiple columns.
    
    import com.snowflake.snowpark.functions._
    import com.snowflake.snowpark.tableFunctions._
    
    df.join(
      tableFunction("flatten"),
      Map("input" -> parse_json(df("a")))
    )
    func

    TableFunction object, which can be one of the values in the tableFunctions object or an object that you create from the TableFunction class.

    args

    Map of arguments to pass to the specified table function. Some functions, like flatten , have named parameters. Use this map to specify the parameter names and their corresponding values.

    Since

    0.4.0

  66. def join ( func: TableFunction , args: Seq [ Column ] , partitionBy: Seq [ Column ] , orderBy: Seq [ Column ] ) : DataFrame

    Joins the current DataFrame with the output of the specified user-defined table function (UDTF) func .

    Joins the current DataFrame with the output of the specified user-defined table function (UDTF) func .

    To pass arguments to the table function, use the args argument of this method. In the table function arguments, you can include references to columns in this DataFrame.

    To specify a PARTITION BY or ORDER BY clause, use the partitionBy and orderBy arguments.

    For example:

    // The following example passes the values in the column `col1` to the
    // user-defined tabular function (UDTF) `udtf`, partitioning the
    // data by `col2` and sorting the data by `col1`. The example returns
    // a new DataFrame that joins the contents of the current DataFrame with
    // the output of the UDTF.
    df.join(TableFunction("udtf"), Seq(df("col1")), Seq(df("col2")), Seq(df("col1")))
    func

    TableFunction object that represents a user-defined table function (UDTF).

    args

    A list of arguments to pass to the specified table function.

    partitionBy

    A list of columns partitioned by.

    orderBy

    A list of columns ordered by.

    Since

    1.7.0

  67. def join ( func: TableFunction , args: Seq [ Column ] ) : DataFrame

    Joins the current DataFrame with the output of the specified table function func .

    Joins the current DataFrame with the output of the specified table function func .

    To pass arguments to the table function, use the args argument of this method. In the table function arguments, you can include references to columns in this DataFrame.

    For example:

    // The following example uses the split_to_table function to split
    // column 'a' in this DataFrame on the character ','.
    // Each row in this DataFrame will produce N rows in the resulting DataFrame,
    // where N is the number of tokens in the column 'a'.
    import com.snowflake.snowpark.functions._
    import com.snowflake.snowpark.tableFunctions._
    
    df.join(split_to_table, Seq(df("a"), lit(",")))
    func

    TableFunction object, which can be one of the values in the tableFunctions object or an object that you create from the TableFunction class.

    args

    A list of arguments to pass to the specified table function.

    Since

    0.4.0

  68. def join ( func: TableFunction , firstArg: Column , remaining: Column * ) : DataFrame

    Joins the current DataFrame with the output of the specified table function func .

    Joins the current DataFrame with the output of the specified table function func .

    To pass arguments to the table function, use the firstArg and remaining arguments of this method. In the table function arguments, you can include references to columns in this DataFrame.

    For example:

    // The following example uses the split_to_table function to split
    // column 'a' in this DataFrame on the character ','.
    // Each row in the current DataFrame will produce N rows in the resulting DataFrame,
    // where N is the number of tokens in the column 'a'.
    
    import com.snowflake.snowpark.functions._
    import com.snowflake.snowpark.tableFunctions._
    
    df.join(split_to_table, df("a"), lit(","))
    func

    TableFunction object, which can be one of the values in the tableFunctions object or an object that you create from the TableFunction class.

    firstArg

    The first argument to pass to the specified table function.

    remaining

    A list of any additional arguments for the specified table function.

    Since

    0.4.0

  69. def join ( right: DataFrame , joinExprs: Column , joinType: String ) : DataFrame

    Performs a join of the specified type ( joinType ) with the current DataFrame and another DataFrame ( right ) using the join condition specified in an expression ( joinExpr ).

    Performs a join of the specified type ( joinType ) with the current DataFrame and another DataFrame ( right ) using the join condition specified in an expression ( joinExpr ).

    To disambiguate columns with the same name in the left DataFrame and right DataFrame, use the apply or col method of each DataFrame ( df("col") or df.col("col") ). You can use this approach to disambiguate columns in the joinExprs parameter and to refer to columns in the returned DataFrame.

    For example:

    val dfJoin = df1.join(df2, df1("a") === df2("b"), "left")
    val dfJoin2 = df1.join(df2, df1("a") === df2("b") && df1("c" === df2("d"), "outer")
    val dfJoin3 = df1.join(df2, df1("a") === df2("a") && df1("b" === df2("b"), "outer")
    // If both df1 and df2 contain column 'c'
    val project = dfJoin3.select(df1("c") + df2("c"))

    If you need to join a DataFrame with itself, keep in mind that there is no way to distinguish between columns on the left and right sides in a join expression. For example:

    val dfJoined = df.join(df, df("a") === df("b"), joinType) // Column references are ambiguous

    To do a self-join, you can you either clone( clone ) the DataFrame as follows,

    val clonedDf = df.clone
    val dfJoined = df.join(clonedDf, df("a") === clonedDf("b"), joinType)

    or you can call a join method that allows you to pass in 'usingColumns' parameter.

    right

    The other DataFrame to join.

    joinExprs

    Expression that specifies the join condition.

    joinType

    The type of join (e.g. "right" , "outer" , etc.).

    returns

    A DataFrame

    Since

    0.1.0

  70. def join ( right: DataFrame , joinExprs: Column ) : DataFrame

    Performs a default inner join of the current DataFrame and another DataFrame ( right ) using the join condition specified in an expression ( joinExpr ).

    Performs a default inner join of the current DataFrame and another DataFrame ( right ) using the join condition specified in an expression ( joinExpr ).

    To disambiguate columns with the same name in the left DataFrame and right DataFrame, use the apply or col method of each DataFrame ( df("col") or df.col("col") ). You can use this approach to disambiguate columns in the joinExprs parameter and to refer to columns in the returned DataFrame.

    For example:

    val dfJoin = df1.join(df2, df1("a") === df2("b"))
    val dfJoin2 = df1.join(df2, df1("a") === df2("b") && df1("c" === df2("d"))
    val dfJoin3 = df1.join(df2, df1("a") === df2("a") && df1("b" === df2("b"))
    // If both df1 and df2 contain column 'c'
    val project = dfJoin3.select(df1("c") + df2("c"))

    If you need to join a DataFrame with itself, keep in mind that there is no way to distinguish between columns on the left and right sides in a join expression. For example:

    val dfJoined = df.join(df, df("a") === df("b")) // Column references are ambiguous

    As a workaround, you can either construct the left and right DataFrames separately, or you can call a join method that allows you to pass in 'usingColumns' parameter.

    right

    The other DataFrame to join.

    joinExprs

    Expression that specifies the join condition.

    returns

    A DataFrame

    Since

    0.1.0

  71. def join ( right: DataFrame , usingColumns: Seq [ String ] , joinType: String ) : DataFrame

    Performs a join of the specified type ( joinType ) with the current DataFrame and another DataFrame ( right ) on a list of columns ( usingColumns ).

    Performs a join of the specified type ( joinType ) with the current DataFrame and another DataFrame ( right ) on a list of columns ( usingColumns ).

    The method assumes that the columns in usingColumns have the same meaning in the left and right DataFrames.

    For example:

    val dfLeftJoin = df1.join(df2, Seq("a"), "left")
    val dfOuterJoin = df1.join(df2, Seq("a", "b"), "outer")
    right

    The other DataFrame to join.

    usingColumns

    A list of the names of the columns to use for the join.

    joinType

    The type of join (e.g. "right" , "outer" , etc.).

    returns

    A DataFrame

    Since

    0.1.0

  72. def join ( right: DataFrame , usingColumns: Seq [ String ] ) : DataFrame

    Performs a default inner join of the current DataFrame and another DataFrame ( right ) on a list of columns ( usingColumns ).

    Performs a default inner join of the current DataFrame and another DataFrame ( right ) on a list of columns ( usingColumns ).

    The method assumes that the columns in usingColumns have the same meaning in the left and right DataFrames.

    For example:

    val dfJoinOnColA = df.join(df2, Seq("a"))
    val dfJoinOnColAAndColB = df.join(df2, Seq("a", "b"))
    right

    The other DataFrame to join.

    usingColumns

    A list of the names of the columns to use for the join.

    returns

    A DataFrame

    Since

    0.1.0

  73. def join ( right: DataFrame , usingColumn: String ) : DataFrame

    Performs a default inner join of the current DataFrame and another DataFrame ( right ) on a column ( usingColumn ).

    Performs a default inner join of the current DataFrame and another DataFrame ( right ) on a column ( usingColumn ).

    The method assumes that the usingColumn column has the same meaning in the left and right DataFrames.

    For example:

    val result = left.join(right, "a")
    right

    The other DataFrame to join.

    usingColumn

    The name of the column to use for the join.

    returns

    A DataFrame

    Since

    0.1.0

  74. def join ( right: DataFrame ) : DataFrame

    Performs a default inner join of the current DataFrame and another DataFrame ( right ).

    Performs a default inner join of the current DataFrame and another DataFrame ( right ).

    Because this method does not specify a join condition, the returned DataFrame is a cartesian product of the two DataFrames.

    If the current and right DataFrames have columns with the same name, and you need to refer to one of these columns in the returned DataFrame, use the apply or col function on the current or right DataFrame to disambiguate references to these columns.

    For example:

    val result = left.join(right)
    val project = result.select(left("common_col") + right("common_col"))
    right

    The other DataFrame to join.

    returns

    A DataFrame

    Since

    0.1.0

  75. def limit ( n: Int ) : DataFrame

    Returns a new DataFrame that contains at most n rows from the current DataFrame (similar to LIMIT in SQL).

    Returns a new DataFrame that contains at most n rows from the current DataFrame (similar to LIMIT in SQL).

    Note that this is a transformation method and not an action method.

    n

    Number of rows to return.

    returns

    A DataFrame

    Since

    0.1.0

  76. def log () : Logger
    Attributes
    protected[ internal ]
    Definition Classes
    Logging
  77. def logDebug ( msg: String , throwable: Throwable ) : Unit
    Attributes
    protected[ internal ]
    Definition Classes
    Logging
  78. def logDebug ( msg: String ) : Unit
    Attributes
    protected[ internal ]
    Definition Classes
    Logging
  79. def logError ( msg: String , throwable: Throwable ) : Unit
    Attributes
    protected[ internal ]
    Definition Classes
    Logging
  80. def logError ( msg: String ) : Unit
    Attributes
    protected[ internal ]
    Definition Classes
    Logging
  81. def logInfo ( msg: String , throwable: Throwable ) : Unit
    Attributes
    protected[ internal ]
    Definition Classes
    Logging
  82. def logInfo ( msg: String ) : Unit
    Attributes
    protected[ internal ]
    Definition Classes
    Logging
  83. def logTrace ( msg: String , throwable: Throwable ) : Unit
    Attributes
    protected[ internal ]
    Definition Classes
    Logging
  84. def logTrace ( msg: String ) : Unit
    Attributes
    protected[ internal ]
    Definition Classes
    Logging
  85. def logWarning ( msg: String , throwable: Throwable ) : Unit
    Attributes
    protected[ internal ]
    Definition Classes
    Logging
  86. def logWarning ( msg: String ) : Unit
    Attributes
    protected[ internal ]
    Definition Classes
    Logging
  87. lazy val na : DataFrameNaFunctions

    Returns a DataFrameNaFunctions object that provides functions for handling missing values in the DataFrame.

    Returns a DataFrameNaFunctions object that provides functions for handling missing values in the DataFrame.

    Since

    0.2.0

  88. def naturalJoin ( right: DataFrame , joinType: String ) : DataFrame

    Performs a natural join of the specified type ( joinType ) with the current DataFrame and another DataFrame ( right ).

    Performs a natural join of the specified type ( joinType ) with the current DataFrame and another DataFrame ( right ).

    For example:

    val dfNaturalJoin = df.naturalJoin(df2, "left")
    right

    The other DataFrame to join.

    joinType

    The type of join (e.g. "right" , "outer" , etc.).

    returns

    A DataFrame

    Since

    0.1.0

  89. def naturalJoin ( right: DataFrame ) : DataFrame

    Performs a natural join (a default inner join) of the current DataFrame and another DataFrame ( right ).

    Performs a natural join (a default inner join) of the current DataFrame and another DataFrame ( right ).

    For example:

    val dfNaturalJoin = df.naturalJoin(df2)

    Note that this is equivalent to:

    val dfNaturalJoin = df.naturalJoin(df2, "inner")
    right

    The other DataFrame to join.

    returns

    A DataFrame

    Since

    0.1.0

  90. final def ne ( arg0: AnyRef ) : Boolean
    Definition Classes
    AnyRef
  91. final def notify () : Unit
    Definition Classes
    AnyRef
    Annotations
    @native () @HotSpotIntrinsicCandidate ()
  92. final def notifyAll () : Unit
    Definition Classes
    AnyRef
    Annotations
    @native () @HotSpotIntrinsicCandidate ()
  93. def pivot ( pivotColumn: Column , values: Seq [ Any ] ) : RelationalGroupedDataFrame

    Rotates this DataFrame by turning the unique values from one column in the input expression into multiple columns and aggregating results where required on any remaining column values.

    Rotates this DataFrame by turning the unique values from one column in the input expression into multiple columns and aggregating results where required on any remaining column values.

    Only one aggregate is supported with pivot.

    For example:

    val dfPivoted = df.pivot(col("col_1"), Seq(1,2,3)).agg(sum(col("col_2")))
    pivotColumn

    Expression for the column that you want to use.

    values

    A list of values in the column.

    returns

    A RelationalGroupedDataFrame

    Since

    0.1.0

  94. def pivot ( pivotColumn: String , values: Seq [ Any ] ) : RelationalGroupedDataFrame

    Rotates this DataFrame by turning the unique values from one column in the input expression into multiple columns and aggregating results where required on any remaining column values.

    Rotates this DataFrame by turning the unique values from one column in the input expression into multiple columns and aggregating results where required on any remaining column values.

    Only one aggregate is supported with pivot.

    For example:

    val dfPivoted = df.pivot("col_1", Seq(1,2,3)).agg(sum(col("col_2")))
    pivotColumn

    The name of the column to use.

    values

    A list of values in the column.

    returns

    A RelationalGroupedDataFrame

    Since

    0.1.0

  95. def randomSplit ( weights: Array [ Double ] ) : Array [ DataFrame ]

    Randomly splits the current DataFrame into separate DataFrames, using the specified weights.

    Randomly splits the current DataFrame into separate DataFrames, using the specified weights.

    NOTE:

    • If only one weight is specified, the returned DataFrame array only includes the current DataFrame.
    • If multiple weights are specified, the current DataFrame will be cached before being split.
    weights

    Weights to use for splitting the DataFrame. If the weights don't add up to 1, the weights will be normalized.

    returns

    A list of DataFrame objects

    Since

    0.2.0

  96. def rename ( newName: String , col: Column ) : DataFrame

    Returns a DataFrame with the specified column col renamed as newName .

    Returns a DataFrame with the specified column col renamed as newName .

    This example renames the column A as NEW_A in the DataFrame.

    val df = session.sql("select 1 as A, 2 as B")
    val dfRenamed = df.rename("NEW_A", col("A"))
    newName

    The new name for the column

    col

    The Column to be renamed

    returns

    A DataFrame

    Since

    0.9.0

  97. def rollup ( cols: Array [ String ] ) : RelationalGroupedDataFrame

    Performs an SQL GROUP BY ROLLUP on the DataFrame.

    Performs an SQL GROUP BY ROLLUP on the DataFrame.

    cols

    An array of column names.

    returns

    A RelationalGroupedDataFrame

    Since

    0.7.0

  98. def rollup ( cols: Seq [ String ] ) : RelationalGroupedDataFrame

    Performs an SQL GROUP BY ROLLUP on the DataFrame.

    Performs an SQL GROUP BY ROLLUP on the DataFrame.

    cols

    A list of column names.

    returns

    A RelationalGroupedDataFrame

    Since

    0.2.0

  99. def rollup ( first: String , remaining: String * ) : RelationalGroupedDataFrame

    Performs an SQL GROUP BY ROLLUP on the DataFrame.

    Performs an SQL GROUP BY ROLLUP on the DataFrame.

    first

    The name of the first column.

    remaining

    A list of the names of additional columns.

    returns

    A RelationalGroupedDataFrame

    Since

    0.1.0

  100. def rollup ( cols: Array [ Column ] ) : RelationalGroupedDataFrame

    Performs an SQL GROUP BY ROLLUP on the DataFrame.

    Performs an SQL GROUP BY ROLLUP on the DataFrame.

    cols

    An array of expressions on columns.

    returns

    A RelationalGroupedDataFrame

    Since

    0.7.0

  101. def rollup [ T ] ( cols: Seq [ Column ] ) ( implicit arg0: ClassTag [ T ] ) : RelationalGroupedDataFrame

    Performs an SQL GROUP BY ROLLUP on the DataFrame.

    Performs an SQL GROUP BY ROLLUP on the DataFrame.

    cols

    A list of expressions on columns.

    returns

    A RelationalGroupedDataFrame

    Since

    0.2.0

  102. def rollup ( first: Column , remaining: Column * ) : RelationalGroupedDataFrame

    Performs an SQL GROUP BY ROLLUP on the DataFrame.

    Performs an SQL GROUP BY ROLLUP on the DataFrame.

    first

    The expression for the first column.

    remaining

    A list of expressions for additional columns.

    returns

    A RelationalGroupedDataFrame

    Since

    0.1.0

  103. def sample ( probabilityFraction: Double ) : DataFrame

    Returns a new DataFrame that contains a sampling of rows from the current DataFrame.

    Returns a new DataFrame that contains a sampling of rows from the current DataFrame.

    NOTE:

    • The number of rows returned may be close to (but not exactly equal to) (probabilityFraction * totalRowCount) .
    • The Snowflake SAMPLE function supports specifying 'probability' as a percentage number. The range of 'probability' is [0.0, 100.0] . The conversion formula is probability = probabilityFraction * 100 .
    probabilityFraction

    The fraction of rows to sample. This must be in the range of 0.0 to 1.0 .

    returns

    A DataFrame containing the sample of rows.

    Since

    0.2.0

  104. def sample ( num: Long ) : DataFrame

    Returns a new DataFrame with a sample of N rows from the underlying DataFrame.

    Returns a new DataFrame with a sample of N rows from the underlying DataFrame.

    NOTE:

    • If the row count in the DataFrame is larger than the requested number of rows, the method returns a DataFrame containing the number of requested rows.
    • If the row count in the DataFrame is smaller than the requested number of rows, the method returns a DataFrame containing all rows.
    num

    The number of rows to sample in the range of 0 to 1,000,000.

    returns

    A DataFrame containing the sample of num rows.

    Since

    0.2.0

  105. lazy val schema : StructType

    Returns the definition of the columns in this DataFrame (the "relational schema" for the DataFrame).

    Returns the definition of the columns in this DataFrame (the "relational schema" for the DataFrame).

    returns

    com.snowflake.snowpark.types.StructType

    Since

    0.1.0

  106. def select ( columns: Array [ String ] ) : DataFrame

    Returns a new DataFrame with a subset of named columns (similar to SELECT in SQL).

    Returns a new DataFrame with a subset of named columns (similar to SELECT in SQL).

    For example:

    val dfSelected = df.select(Array("col1", "col2"))
    columns

    An array of the names of columns to return.

    returns

    A DataFrame

    Since

    0.7.0

  107. def select ( columns: Seq [ String ] ) : DataFrame

    Returns a new DataFrame with a subset of named columns (similar to SELECT in SQL).

    Returns a new DataFrame with a subset of named columns (similar to SELECT in SQL).

    For example:

    val dfSelected = df.select(Seq("col1", "col2", "col3"))
    columns

    A list of the names of columns to return.

    returns

    A DataFrame

    Since

    0.2.0

  108. def select ( first: String , remaining: String * ) : DataFrame

    Returns a new DataFrame with a subset of named columns (similar to SELECT in SQL).

    Returns a new DataFrame with a subset of named columns (similar to SELECT in SQL).

    For example:

    val dfSelected = df.select("col1", "col2", "col3")
    first

    The name of the first column to return.

    remaining

    A list of the names of the additional columns to return.

    returns

    A DataFrame

    Since

    0.1.0

  109. def select ( columns: Array [ Column ] ) : DataFrame

    Returns a new DataFrame with the specified Column expressions as output (similar to SELECT in SQL).

    Returns a new DataFrame with the specified Column expressions as output (similar to SELECT in SQL). Only the Columns specified as arguments will be present in the resulting DataFrame.

    You can use any Column expression.

    For example:

    val dfSelected =
      df.select(Array(df.col("col1"), lit("abc"), df.col("col1") + df.col("col2")))
    columns

    An array of expressions for the columns to return.

    returns

    A DataFrame

    Since

    0.7.0

  110. def select [ T ] ( columns: Seq [ Column ] ) ( implicit arg0: ClassTag [ T ] ) : DataFrame

    Returns a new DataFrame with the specified Column expressions as output (similar to SELECT in SQL).

    Returns a new DataFrame with the specified Column expressions as output (similar to SELECT in SQL). Only the Columns specified as arguments will be present in the resulting DataFrame.

    You can use any Column expression.

    For example:

    val dfSelected = df.select(Seq($"col1", substring($"col2", 0, 10), df("col3") + df("col4")))
    columns

    A list of expressions for the columns to return.

    returns

    A DataFrame

    Since

    0.2.0

  111. def select ( first: Column , remaining: Column * ) : DataFrame

    Returns a new DataFrame with the specified Column expressions as output (similar to SELECT in SQL).

    Returns a new DataFrame with the specified Column expressions as output (similar to SELECT in SQL). Only the Columns specified as arguments will be present in the resulting DataFrame.

    You can use any Column expression.

    For example:

    val dfSelected = df.select($"col1", substring($"col2", 0, 10), df("col3") + df("col4"))
    first

    The expression for the first column to return.

    remaining

    A list of expressions for the additional columns to return.

    returns

    A DataFrame

    Since

    0.1.0

  112. def show ( n: Int , maxWidth: Int ) : Unit

    Evaluates this DataFrame and prints out the first n rows with the specified maximum number of characters per column.

    Evaluates this DataFrame and prints out the first n rows with the specified maximum number of characters per column.

    n

    The number of rows to print out.

    maxWidth

    The maximum number of characters to print out for each column. If the number of characters exceeds the maximum, the method prints out an ellipsis (...) at the end of the column.

    Since

    0.5.0

  113. def show ( n: Int ) : Unit

    Evaluates this DataFrame and prints out the first n rows.

    Evaluates this DataFrame and prints out the first n rows.

    n

    The number of rows to print out.

    Since

    0.1.0

  114. def show () : Unit

    Evaluates this DataFrame and prints out the first ten rows.

    Evaluates this DataFrame and prints out the first ten rows.

    Since

    0.1.0

  115. def sort ( sortExprs: Array [ Column ] ) : DataFrame

    Sorts a DataFrame by the specified expressions (similar to ORDER BY in SQL).

    Sorts a DataFrame by the specified expressions (similar to ORDER BY in SQL).

    For example:

    val dfSorted = df.sort(Array(col("col1").asc, col("col2").desc, col("col3")))
    sortExprs

    An array of Column expressions for sorting the DataFrame.

    returns

    A DataFrame

    Since

    0.7.0

  116. def sort ( sortExprs: Seq [ Column ] ) : DataFrame

    Sorts a DataFrame by the specified expressions (similar to ORDER BY in SQL).

    Sorts a DataFrame by the specified expressions (similar to ORDER BY in SQL).

    For example:

    val dfSorted = df.sort(Seq($"colA", $"colB".desc))
    sortExprs

    A list of Column expressions for sorting the DataFrame.

    returns

    A DataFrame

    Since

    0.2.0

  117. def sort ( first: Column , remaining: Column * ) : DataFrame

    Sorts a DataFrame by the specified expressions (similar to ORDER BY in SQL).

    Sorts a DataFrame by the specified expressions (similar to ORDER BY in SQL).

    For example:

    val dfSorted = df.sort($"colA", $"colB".asc)
    first

    The first Column expression for sorting the DataFrame.

    remaining

    Additional Column expressions for sorting the DataFrame.

    returns

    A DataFrame

    Since

    0.1.0

  118. lazy val stat : DataFrameStatFunctions

    Returns a DataFrameStatFunctions object that provides statistic functions.

    Returns a DataFrameStatFunctions object that provides statistic functions.

    Since

    0.2.0

  119. final def synchronized [ T0 ] ( arg0: ⇒ T0 ) : T0
    Definition Classes
    AnyRef
  120. def toDF ( colNames: Array [ String ] ) : DataFrame

    Creates a new DataFrame containing the data in the current DataFrame but in columns with the specified names.

    Creates a new DataFrame containing the data in the current DataFrame but in columns with the specified names.

    You can use this method to assign column names when constructing a DataFrame. For example:

    For example:

    val df = session.createDataFrame(Seq((1, "a"))).toDF(Array("a", "b"))

    This returns a DataFrame containing the following:

    -------------
    |"A"  |"B"  |
    -------------
    |1    |2    |
    |3    |4    |
    -------------

    If you imported <session_var>.implicits._ , you can use the following syntax to create the DataFrame from a Seq and call toDF to assign column names to the returned DataFrame:

    import mysession.implicits_
    var df = Seq((1, 2), (3, 4)).toDF(Array("a", "b"))

    The number of column names that you pass in must match the number of columns in the current DataFrame.

    colNames

    An array of column names.

    returns

    A DataFrame

    Since

    0.7.0

  121. def toDF ( colNames: Seq [ String ] ) : DataFrame

    Creates a new DataFrame containing the data in the current DataFrame but in columns with the specified names.

    Creates a new DataFrame containing the data in the current DataFrame but in columns with the specified names.

    You can use this method to assign column names when constructing a DataFrame. For example:

    For example:

    var df = session.createDataFrame(Seq((1, 2), (3, 4))).toDF(Seq("a", "b"))

    This returns a DataFrame containing the following:

    -------------
    |"A"  |"B"  |
    -------------
    |1    |2    |
    |3    |4    |
    -------------

    If you imported <session_var>.implicits._ , you can use the following syntax to create the DataFrame from a Seq and call toDF to assign column names to the returned DataFrame:

    import mysession.implicits_
    var df = Seq((1, 2), (3, 4)).toDF(Seq("a", "b"))

    The number of column names that you pass in must match the number of columns in the current DataFrame.

    colNames

    A list of column names.

    returns

    A DataFrame

    Since

    0.2.0

  122. def toDF ( first: String , remaining: String * ) : DataFrame

    Creates a new DataFrame containing the columns with the specified names.

    Creates a new DataFrame containing the columns with the specified names.

    You can use this method to assign column names when constructing a DataFrame. For example:

    For example:

    var df = session.createDataFrame(Seq((1, "a")).toDF(Seq("a", "b"))

    This returns a DataFrame containing the following:

    -------------
    |"A"  |"B"  |
    -------------
    |1    |2    |
    |3    |4    |
    -------------

    if you imported <session_var>.implicits._ , you can use the following syntax to create the DataFrame from a Seq and call toDF to assign column names to the returned DataFrame:

    import mysession.implicits_
    var df = Seq((1, 2), (3, 4)).toDF(Seq("a", "b"))

    The number of column names that you pass in must match the number of columns in the current DataFrame.

    first

    The name of the first column.

    remaining

    A list of the rest of the column names.

    returns

    A DataFrame

    Since

    0.1.0

  123. def toLocalIterator : Iterator [ Row ]

    Executes the query representing this DataFrame and returns an iterator of Row objects that you can use to retrieve the results.

    Executes the query representing this DataFrame and returns an iterator of Row objects that you can use to retrieve the results.

    Unlike the collect method, this method does not load all data into memory at once.

    returns

    An Iterator of Row

    Since

    0.5.0

  124. def toString () : String
    Definition Classes
    AnyRef → Any
  125. def transformation ( funcName: String ) ( func: ⇒ DataFrame ) : DataFrame
    Attributes
    protected
    Annotations
    @inline ()
  126. def union ( other: DataFrame ) : DataFrame

    Returns a new DataFrame that contains all the rows in the current DataFrame and another DataFrame ( other ), excluding any duplicate rows.

    Returns a new DataFrame that contains all the rows in the current DataFrame and another DataFrame ( other ), excluding any duplicate rows. Both input DataFrames must contain the same number of columns.

    For example:

    val df1and2 = df1.union(df2)
    other

    The other DataFrame that contains the rows to include.

    returns

    A DataFrame

    Since

    0.1.0

  127. def unionAll ( other: DataFrame ) : DataFrame

    Returns a new DataFrame that contains all the rows in the current DataFrame and another DataFrame ( other ), including any duplicate rows.

    Returns a new DataFrame that contains all the rows in the current DataFrame and another DataFrame ( other ), including any duplicate rows. Both input DataFrames must contain the same number of columns.

    For example:

    val df1and2 = df1.unionAll(df2)
    other

    The other DataFrame that contains the rows to include.

    returns

    A DataFrame

    Since

    0.1.0

  128. def unionAllByName ( other: DataFrame ) : DataFrame

    Returns a new DataFrame that contains all the rows in the current DataFrame and another DataFrame ( other ), including any duplicate rows.

    Returns a new DataFrame that contains all the rows in the current DataFrame and another DataFrame ( other ), including any duplicate rows.

    This method matches the columns in the two DataFrames by their names, not by their positions. The columns in the other DataFrame are rearranged to match the order of columns in the current DataFrame.

    For example:

    val df1and2 = df1.unionAllByName(df2)
    other

    The other DataFrame that contains the rows to include.

    returns

    A DataFrame

    Since

    0.9.0

  129. def unionByName ( other: DataFrame ) : DataFrame

    Returns a new DataFrame that contains all the rows in the current DataFrame and another DataFrame ( other ), excluding any duplicate rows.

    Returns a new DataFrame that contains all the rows in the current DataFrame and another DataFrame ( other ), excluding any duplicate rows.

    This method matches the columns in the two DataFrames by their names, not by their positions. The columns in the other DataFrame are rearranged to match the order of columns in the current DataFrame.

    For example:

    val df1and2 = df1.unionByName(df2)
    other

    The other DataFrame that contains the rows to include.

    returns

    A DataFrame

    Since

    0.1.0

  130. final def wait ( arg0: Long , arg1: Int ) : Unit
    Definition Classes
    AnyRef
    Annotations
    @throws ( ... )
  131. final def wait ( arg0: Long ) : Unit
    Definition Classes
    AnyRef
    Annotations
    @throws ( ... ) @native ()
  132. final def wait () : Unit
    Definition Classes
    AnyRef
    Annotations
    @throws ( ... )
  133. def where ( condition: Column ) : DataFrame

    Filters rows based on the specified conditional expression (similar to WHERE in SQL).

    Filters rows based on the specified conditional expression (similar to WHERE in SQL). This is equivalent to calling filter .

    For example:

    // The following two result in the same SQL query:
    pricesDF.filter($"price" > 100)
    pricesDF.where($"price" > 100)
    condition

    Filter condition defined as an expression on columns.

    returns

    A filtered DataFrame

    Since

    0.1.0

  134. def withColumn ( colName: String , col: Column ) : DataFrame

    Returns a DataFrame with an additional column with the specified name ( colName ).

    Returns a DataFrame with an additional column with the specified name ( colName ). The column is computed by using the specified expression ( col ).

    If a column with the same name already exists in the DataFrame, that column is replaced by the new column.

    This example adds a new column named mean_price that contains the mean of the existing price column in the DataFrame.

    val dfWithMeanPriceCol = df.withColumn("mean_price", mean($"price"))
    colName

    The name of the column to add or replace.

    col

    The Column to add or replace.

    returns

    A DataFrame

    Since

    0.1.0

  135. def withColumns ( colNames: Seq [ String ] , values: Seq [ Column ] ) : DataFrame

    Returns a DataFrame with additional columns with the specified names ( colNames ).

    Returns a DataFrame with additional columns with the specified names ( colNames ). The columns are computed by using the specified expressions ( cols ).

    If columns with the same names already exist in the DataFrame, those columns are replaced by the new columns.

    This example adds new columns named mean_price and avg_price that contain the mean and average of the existing price column.

    val dfWithAddedColumns = df.withColumn(
        Seq("mean_price", "avg_price"), Seq(mean($"price"), avg($"price") )
    colNames

    A list of the names of the columns to add or replace.

    values

    A list of the Column objects to add or replace.

    returns

    A DataFrame

    Since

    0.1.0

  136. def withPlan ( plan: LogicalPlan ) : DataFrame
    Attributes
    protected
    Annotations
    @inline ()
  137. def write : DataFrameWriter

    Returns a DataFrameWriter object that you can use to write the data in the DataFrame to any supported destination.

    Returns a DataFrameWriter object that you can use to write the data in the DataFrame to any supported destination. The Default SaveMode for the returned DataFrameWriter is Append .

    Example:

    df.write.saveAsTable("table1")
    returns

    A DataFrameWriter

    Since

    0.1.0

Deprecated Value Members

  1. def finalize () : Unit
    Attributes
    protected[ lang ]
    Definition Classes
    AnyRef
    Annotations
    @throws ( classOf[java.lang.Throwable] ) @Deprecated
    Deprecated

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Actions

Basic DataFrame Functions

Transformations

Ungrouped