Quick Reference: Snowpark Scala APIs for SQL Commands

This topic provides a quick reference of some of the Snowpark APIs that correspond to SQL commands.

(Note that this is not a complete list of the APIs that correspond to SQL commands.)

In this Topic:

Performing Queries

Selecting Columns

To select specific columns, use DataFrame.select.

Example of a SQL Statement

Example of Snowpark Code

SELECT id, name FROM sample_product_data;
val dfSelectedCols = df.select(col("id"), col("name"))
dfSelectedCols.show()

Renaming Columns

To rename a column, use Column.as, Column.alias, or Column.name.

Example of a SQL Statement

Example of Snowpark Code

SELECT id AS item_id FROM sample_product_data;
val dfRenamedCol = df.select(col("id").as("item_id"))
dfRenamedCol.show()
val dfRenamedCol = df.select(col("id").alias("item_id"))
dfRenamedCol.show()
val dfRenamedCol = df.select(col("id").name("item_id"))
dfRenamedCol.show()

Filtering Data

To filter data, use DataFrame.filter or DataFrame.where.

Example of a SQL Statement

Example of Snowpark Code

SELECT * FROM sample_product_data WHERE id = 1;
val dfFilteredRows = df.filter((col("id") === 1))
dfFilteredRows.show()
val dfFilteredRows = df.where((col("id") === 1))
dfFilteredRows.show()

Sorting Data

To sort data, use DataFrame.sort.

Example of a SQL Statement

Example of Snowpark Code

SELECT * FROM sample_product_data ORDER BY category_id;
val dfSorted = df.sort(col("category_id"))
dfSorted.show()

Limiting the Number of Rows Returned

To limit the number of rows returned, use DataFrame.limit. See Limiting the Number of Rows in a DataFrame.

Example of a SQL Statement

Example of Snowpark Code

SELECT * FROM sample_product_data
  ORDER BY category_id LIMIT 2;
val dfSorted = df.sort(col("category_id")).limit(2);
val arrayRows = dfSorted.collect()

Performing Joins

To perform a join, use DataFrame.join or DataFrame.naturalJoin. See Joining DataFrames.

Example of a SQL Statement

Example of Snowpark Code

SELECT * FROM sample_a
  INNER JOIN sample_b
  on sample_a.id_a = sample_b.id_a;
val dfJoined =
  dfLhs.join(dfRhs, dfLhs.col("id_a") === dfRhs.col("id_a"))
dfJoined.show()
SELECT * FROM sample_a NATURAL JOIN sample_b;
val dfJoined = dfLhs.naturalJoin(dfRhs)
dfJoined.show()

Querying Semi-Structured Data

To traverse semi-structured data, use Column.apply(“<field_name>”) and Column.apply(<index>). See Working with Semi-Structured Data.

Example of a SQL Statement

Example of Snowpark Code

SELECT src:salesperson.name FROM car_sales;
dfJsonField =
  df.select(col("src")("salesperson")("name"))
dfJsonField.show()

Grouping and Aggregating Data

To group data, use DataFrame.groupBy. This returns a RelationalGroupedDataFrame object, which you can use to perform the aggregations.

Example of a SQL Statement

Example of Snowpark Code

SELECT category_id, count(*)
  FROM sample_product_data GROUP BY category_id;
val dfCountPerCategory = df.groupBy(col("category")).count()
dfCountPerCategory.show()

Calling Window Functions

To call a window function, use the Window object methods to build a WindowSpec object, which in turn you can use for windowing functions (similar to using ‘<function> OVER … PARTITION BY … ORDER BY’).

Example of a SQL Statement

Example of Snowpark Code

SELECT category_id, price_date, SUM(amount) OVER
  (PARTITION BY category_id ORDER BY price_date)
  FROM prices ORDER BY price_date;
val window = Window.partitionBy(
  col("category")).orderBy(col("price_date"))
val dfCumulativePrices = dfPrices.select(
  col("category"), col("price_date"),
  sum(col("amount")).over(window)).sort(col("price_date"))
dfCumulativePrices.show()

Updating, Deleting, and Merging Rows

To update, delete, and merge rows in a table, use Updatable. See Updating, Deleting, and Merging Rows in a Table.

Example of a SQL Statement

Example of Snowpark Code

UPDATE sample_product_data
  SET serial_number = 'xyz' WHERE id = 12;
val updateResult =
  updatableDf.update(
    Map("serial_number" -> lit("xyz")),
    col("id") === 12)
DELETE FROM sample_product_data
  WHERE category_id = 50;
val deleteResult =
  updatableDf.delete(updatableDf("category_id") === 50)
MERGE  INTO target_table USING source_table
  ON target_table.id = source_table.id
  WHEN MATCHED THEN
    UPDATE SET target_table.description =
      source_table.description;
val mergeResult =
   target.merge(source, target("id") === source("id"))
  .whenMatched.update(Map("description" -> source("description")))
  .collect()

Working with Stages

For more information on working with stages, see Working With Files in a Stage.

Uploading and Downloading Files from a Stage

To upload and download files from a stage, use FileOperation. See Uploading and Downloading Files in a Stage.

Example of a SQL Statement

Example of Snowpark Code

PUT file:///tmp/*.csv @myStage OVERWRITE = TRUE;
val putOptions = Map("OVERWRITE" -> "TRUE")
val putResults = session.file.put(
  "file:///tmp/*.csv", "@myStage", putOptions)
GET @myStage file:///tmp PATTERN = '.*.csv.gz';
val getOptions = Map("PATTERN" -> s"'.*.csv.gz'")
val getResults = session.file.get(
 "@myStage", "file:///tmp", getOptions)

Reading Data from Files in a Stage

To read data from files in a stage, use DataFrameReader to create a DataFrame for the data. See Setting Up a DataFrame for Files in a Stage.

Example of a SQL Statement

Example of Snowpark Code

CREATE FILE FORMAT snowpark_temp_format TYPE = JSON;
SELECT "$1"[0]['salesperson']['name'] FROM (
  SELECT $1::VARIANT AS "$1" FROM @mystage/car_sales.json(
    FILE_FORMAT => 'snowpark_temp_format')) LIMIT 10;
DROP FILE FORMAT snowpark_temp_format;
val df = session.read.json(
  "@mystage/car_sales.json").select(
    col("$1")(0)("salesperson")("name"))
df.show();

Copying Data from Files in a Stage to a Table

To copy data from files in a stage to a table, use DataFrameReader to create a CopyableDataFrame for the data, and use the CopyableDataFrame.copyInto method to copy the data to the table. See Copying Data from Files into a Table.

Example of a SQL Statement

Example of Snowpark Code

COPY INTO new_car_sales
  FROM @mystage/car_sales.json
  FILE_FORMAT = (TYPE = JSON);
val dfCopyableDf = session.read.json("@mystage/car_sales.json")
dfCopyableDf.copyInto("new_car_sales")

Saving a DataFrame to Files on a Stage

To save a DataFrame to files on a stage, use the DataFrameWriter method named after the format of the files that you want to use. See Saving a DataFrame to Files on a Stage.

Example of a SQL Statement

Example of Snowpark Code

COPY INTO @mystage/saved_data.json
  FROM (  SELECT  *  FROM (car_sales) )
  FILE_FORMAT = ( TYPE = JSON COMPRESSION = 'none' )
  OVERWRITE = TRUE
  DETAILED_OUTPUT = TRUE
val df = session.table("car_sales")
val writeFileResult = df.write.mode(
  SaveMode.Overwrite).option(
  "DETAILED_OUTPUT", "TRUE").option(
  "compression", "none").json(
  "@mystage/saved_data.json")

Creating and Calling User-Defined Functions (UDFs)

To create a Scala function that serves as a UDF (an anonymous UDF), use udf.

To create a temporary or permanent UDF that you can call by name, use UDFRegistration.registerTemporary or UDFRegistration.registerPermanent.

To call a permanent UDF by name, use callUDF.

For details, see Creating User-Defined Functions (UDFs) for DataFrames in Scala and Calling Scalar User-Defined Functions (UDFs).

Example of a SQL Statement

Example of Snowpark Code

CREATE FUNCTION <temp_function_name>
  RETURNS INT
  LANGUAGE JAVA
  ...
  AS
  ...;

SELECT ...,
  <temp_function_name>(amount) AS doublenum
  FROM sample_product_data;
val doubleUdf = udf((x: Int) => x + x)
val dfWithDoubleNum = df.withColumn(
 "doubleNum", doubleUdf(col("amount")))
dfWithDoubleNum.show()
CREATE FUNCTION <temp_function_name>
  RETURNS INT
  LANGUAGE JAVA
  ...
  AS
  ...;

SELECT ...,
  <temp_function_name>(amount) AS doublenum
  FROM sample_product_data;
session.udf.registerTemporary(
  "doubleUdf", (x: Int) => x + x)
val dfWithDoubleNum = df.withColumn(
 "doubleNum", callUDF("doubleUdf", (col("amount"))))
dfWithDoubleNum.show()
CREATE FUNCTION doubleUdf(arg1 INT)
  RETURNS INT
  LANGUAGE JAVA
  ...
  AS
  ...;

SELECT ...,
  doubleUdf(amount) AS doublenum
  FROM sample_product_data;
session.udf.registerPermanent(
  "doubleUdf", (x: Int) => x + x, "mystage")
val dfWithDoubleNum = df.withColumn(
 "doubleNum", callUDF("doubleUdf", (col("amount"))))
dfWithDoubleNum.show()
Back to top