Writing Stored Procedures in Snowpark (Scala)

With this preview feature, you can write a stored procedure in Scala. You can use the Snowpark library within your stored procedure to perform queries, updates, and other work on tables in Snowflake.

This topic explains how to write a stored procedure in Scala.

In this Topic:

Introduction

With Snowpark Stored Procedures, you can build and run your data pipeline within Snowflake, using a Snowflake warehouse as the compute framework. For the code for your data pipeline, you use the Snowpark API for Scala to write stored procedures. To schedule the execution of these stored procedures, you use tasks.

In this preview, Snowpark Stored Procedures have the following limitations:

  • Concurrency is not supported in stored procedures. For example, from within your stored procedure, you cannot submit queries from multiple threads.

  • If you are executing your stored procedure from a task, you must specify a warehouse when creating the task. (You cannot use Snowflake-managed compute resources to run the task.)

  • You cannot use some of the Snowpark APIs in your stored procedure. See Accessing Data in Snowflake from Your Stored Procedure for the list of unsupported APIs.

The next sections provide more information about Snowpark Stored Procedures for Scala.

Prerequisites

You must use version 1.1.0 or a more recent version of the Snowpark library.

If you are writing a pre-compiled stored procedure, you must compile your classes to run in Java version 11.x.

Setting Up Your Development Environment for Snowpark

Next, set up your development environment to use the Snowpark library. See Setting Up Your Development Environment for Snowpark Scala.

Choosing to Create an In-Line or Pre-Compiled Stored Procedure

As is the case with Java UDFs, you can either create an in-line stored procedure or a pre-compiled stored procedure.

  • In an in-line stored procedure, you write your Scala code in the AS clause of the CREATE PROCEDURE statement. For example:

    CREATE OR REPLACE PROCEDURE MYPROC(fromTable STRING, toTable STRING, count INT)
      RETURNS STRING
      LANGUAGE SCALA
      RUNTIME_VERSION = '2.12'
      PACKAGES = ('com.snowflake:snowpark:latest')
      HANDLER = 'MyScalaObject.run'
      AS
      $$
        object MyScalaObject {
          def run(session: com.snowflake.snowpark.Session, fromTable: String, toTable: String, count: Int): String = {
            session.table(fromTable).limit(count).write.saveAsTable(toTable)
            return "SUCCESS"
          }
        }
      $$;
    

    Note

    For faster execution on repeated calls, you can set TARGET_PATH to the location of a JAR file in which Snowflake should save the compiled classes. See Creating the Stored Procedure.

  • In a pre-compiled stored procedure, you write your Scala code in a .scala source file.

    For example:

    object MyScalaObject {
      def run(session: com.snowflake.snowpark.Session, fromTable: String, toTable: String, count: Int): String = {
        session.table(fromTable).limit(count).write.saveAsTable(toTable)
        return "SUCCESS"
      }
    }
    

    You then compile your code, package the classes in a JAR file, upload the JAR file to a stage, and execute the CREATE PROCEDURE command, pointing to the JAR file on the stage. For example:

    CREATE OR REPLACE PROCEDURE MYPROC(value INT, fromTable STRING, toTable STRING, count INT)
      RETURNS INT
      LANGUAGE SCALA
      RUNTIME_VERSION = '2.12'
      PACKAGES = ('com.snowflake:snowpark:latest')
      IMPORTS = ('@mystage/MyCompiledScalaCode.jar')
      HANDLER = 'MyScalaObject.run';
    

Writing the Scala Code for the Stored Procedure

For the code for your stored procedure, you must write a Scala method or function. The following sections provide guidelines for writing your code:

Planning to Write Your Stored Procedure

The Scala code for your stored procedure has the same constraints as the code for a Java UDF. When planning to write your stored procedures, you should take those constraints into consideration.

Limit the Amount of Memory Consumed

As is the case with Java UDFs, Snowflake places limits on a method or function in terms of the amount of memory needed.

In your method or function, you should avoid consuming too much memory.

Write Thread-Safe Code

Similarly, as is the case with Java UDFs, you should make sure that your method or function is thread safe.

Understand the Security Restrictions

Your method or function runs within a restricted engine, so you should follow the same rules as documented for Java UDFs.

Decide on Using Owner’s Rights or Caller’s Rights

In addition, when planning to write your stored procedure, consider whether you want the stored procedure to run with caller’s rights or owner’s rights.

Writing the Class or Object

The method or function that you define should be part of a class or object.

When writing the class or object, note the following:

  • The class (or object) and method must not be protected or private.

  • If the method is not static and you want to define a constructor, define a zero-argument constructor for the class. Snowflake invokes this zero-argument constructor at initialization time to create an instance of your class.

  • You can define different methods for different stored procedures in the same class or object.

Writing the Method or Function

When writing the method or function for the stored procedure, note the following:

  • Specify the Snowpark Session object as the first argument of your method or function.

    When you call your stored procedure, Snowflake automatically creates a Session object and passes it to your stored procedure. (You cannot create the Session object yourself.)

  • For the rest of the arguments and for the return value, use the Scala types that correspond to Snowflake data types.

  • Your method or function must return a value. For stored procedures in Scala, a return value is required.

Accessing Data in Snowflake from Your Stored Procedure

To access data in Snowflake, use the Snowpark library APIs.

When handling a call to your Scala stored procedure, Snowflake creates a Snowpark Session object and passes the object to the method or function for your stored procedure.

As is the case with stored procedures in other languages, the context for the session (e.g. the privileges, current database and schema, etc.) is determined by whether the stored procedure runs with caller’s rights or owner’s rights. For details, see Session State.

You can use this Session object to call APIs in the Snowpark library. For example, you can create a DataFrame for a table or execute a SQL statement.

See the Snowpark Developer Guide for Scala for more information.

Note

You cannot use the following Snowpark APIs in a stored procedure:

The following is an example of a Scala method that copies a specified number of rows from one table to another table. The method takes the following arguments:

  • A Snowpark Session object

  • The name of the table to copy the rows from

  • The name of the table to save the rows to

  • The number of rows to copy

The method in this example returns a string.

object MyObject
{
  def myProcedure(session: com.snowflake.snowpark.Session, fromTable: String, toTable: String, count: Int): String =
  {
    session.table(fromTable).limit(count).write.saveAsTable(toTable)
    return "Success"
  }
}

The following example defines a function, rather than a method:

object MyObject
{
  val myProcedure = (session: com.snowflake.snowpark.Session, fromTable: String, toTable: String, count: Int): String =>
  {
    session.table(fromTable).limit(count).write.saveAsTable(toTable)
    "Success"
  }
}

Accessing Other Classes and Resource Files

If your code depends on classes defined outside of the stored procedure (e.g. classes in a separate JAR file) or resource files, you must upload those files to a stage so that the files will be available when the stored procedure executes.

Later, when executing the CREATE PROCEDURE statement, use the IMPORTS clause to point to these files.

Preparing a Pre-compiled Stored Procedure

If you plan to create a pre-compiled stored procedure (rather than an in-line stored procedure), you must compile and package your classes in a JAR file, and you must upload the JAR file to a stage.

Compiling and Packaging Your Scala Code

To make it easier to set up your stored procedure, build a JAR file that contains all of the dependencies needed for your stored procedure. Later, you’ll need to upload the JAR file to a stage and point to the JAR file from your CREATE PROCEDURE statement. This process is simpler if you have fewer JAR files to upload and point to.

The next sections provide some tips on creating a JAR file that contains all of the dependencies:

Using SBT to Build a JAR File With Dependencies

If you are using SBT to build and package your code, you can use the sbt-assembly plugin to create a JAR file containing all of the dependencies.

  1. In the directory containing your build.sbt file, create a file named plugins.sbt in the project/ subdirectory.

    For example, if the directory containing your build.sbt file is hello-snowpark/, create the file hello-snowpark/project/plugins.sbt:

    hello-snowpark/
                 |__ build.sbt
                 |__ project/
                           |__plugins.sbt
    
  2. In the plugins.sbt file, add the following line:

    addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "1.1.0")
    
  3. If your project requires multiple versions of the same library (e.g. if your project depends on two libraries that require different versions of a third library), define a merge strategy in your build.sbt file to resolve the dependencies. See Merge Strategy for details.

  4. In your build.sbt file, update the Snowpark library version to at least the minimum version required.

    libraryDependencies += "com.snowflake" % "snowpark" % "1.1.0" % "provided"
    

    In addition, exclude the Snowpark library from the JAR file by specifying that the dependency is "provided" (as shown above).

  5. Change to the directory for your project (e.g. hello-snowpark), and run the following command:

    sbt assembly
    

    Note

    If you encounter the error Not a valid command: assembly, Not a valid project ID: assembly, or Not a valid key: assembly, make sure that the plugins.sbt file is in the subdirectory named project/ (as mentioned in step 1).

    This command creates a JAR file in the following location:

    target/scala-<version>/<project-name>-assembly-1.0.jar
    

Using Maven to Build a JAR File With Dependencies

If you are using Maven to build and package your code, you can use the Maven Assembly Plugin to create a JAR file that contains all of the dependencies.

  1. In the directory for your project (e.g. hello-snowpark/), create a subdirectory named assembly/.

  2. In that directory, create an assembly descriptor file that specifies that you want to include dependencies in your JAR file.

    For an example, see jar-with-dependencies.

  3. In the assembly descriptor, add a <dependencySet> element that excludes the Snowpark library from your JAR file.

    For example:

    <assembly xmlns="http://maven.apache.org/ASSEMBLY/2.1.0"
              xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
              xsi:schemaLocation="http://maven.apache.org/ASSEMBLY/2.1.0 http://maven.apache.org/xsd/assembly-2.1.0.xsd">
      <id>jar-with-dependencies</id>
      <formats>
         <format>jar</format>
      </formats>
      <includeBaseDirectory>false</includeBaseDirectory>
      <dependencySets>
        <dependencySet>
          <outputDirectory>/</outputDirectory>
          <useProjectArtifact>false</useProjectArtifact>
          <unpack>true</unpack>
          <scope>provided</scope>
          <excludes>
            <exclude>com.snowflake:snowpark</exclude>
          </excludes>
        </dependencySet>
      </dependencySets>
    </assembly>
    

    For information about the elements in an assembly descriptor, see Assembly Descriptor Format.

  4. In your pom.xml file, under the <project> » <build> » <plugins>, add a <plugin> element for the Maven Assembly Plugin.

    In addition, under <configuration> » <descriptors>, add a <descriptor> that points to the assembly descriptor file that you created in the previous steps.

    For example:

    <project>
      [...]
      <build>
        [...]
        <plugins>
          <plugin>
            <artifactId>maven-assembly-plugin</artifactId>
            <version>3.3.0</version>
            <configuration>
              <descriptors>
                <descriptor>src/assembly/jar-with-dependencies.xml</descriptor>
              </descriptors>
            </configuration>
            [...]
          </plugin>
          [...]
        </plugins>
        [...]
      </build>
      [...]
    </project>
    

Using Other Tools to Build a JAR File With Dependencies

If you are not using SBT or Maven, see the documentation for your build tool for instructions on building a JAR file with all of the dependencies.

For example, if you are using an IntelliJ IDEA project (not an SBT project in IntelliJ), see the instructions on setting up an artifact configuration.

Uploading Files to a Stage

Next, upload the files required for your stored procedure to a stage:

  1. Choose a stage for your files.

    You can use an external or named internal stage. If you plan to use the PUT command to upload the files, use a named internal stage. (The PUT command does not support uploading files to external stages.)

    You can use an existing stage, or you can create a new stage by executing CREATE STAGE. For example, the following command creates a new internal stage named mystage:

    CREATE STAGE mystage;
    

    Note

    The owner of the stored procedure must have the READ privilege to the stage.

  2. Use the PUT command to upload the following files to that stage:

    • The JAR file containing your compiled Scala code.

    • Any other files that your stored procedure depends on.

    For example:

    PUT file:///Users/MyUserName/myjar.jar
            @mystage
            AUTO_COMPRESS = FALSE
            OVERWRITE = TRUE
            ;
    

    Note

    If you omit AUTO_COMPRESS = FALSE, the PUT command automatically compresses the file. The name of the compressed file on the stage will be myjar.jar.gz. Later, when you execute the CREATE PROCEDURE command, you will need to specify the filename with this .gz extension in the IMPORTS clause.

Note that if you delete or rename the JAR file containing your compiled Scala code, you can no longer call the stored procedure.

If you need to update your JAR file, then:

  • Update it while no calls to the stored procedure can be made.

  • Add the clause OVERWRITE=TRUE to the PUT command.

Creating the Stored Procedure

Next, execute the CREATE PROCEDURE statement to create a stored procedure for your method or function. Set the parameters listed in the table below.

Parameter

Description

[ arg_name arg_data_type [, ... ] ]

  • Omit the argument for the Snowpark Session object. As mentioned earlier, this is not a formal parameter that you specify in CREATE PROCEDURE or CALL. When you call your stored procedure, Snowflake creates a Session object and passes it to your stored procedure.

  • For the data types of the rest of the arguments, use the Snowflake data types that correspond to the Scala types of the arguments in your method or function.

RETURNS result_data_type

Specify RETURNS with the Snowflake data type of your return value.

LANGUAGE SCALA

You must specify this to indicate that your stored procedure code is written in Scala.

RUNTIME_VERSION = '2.12'

You must specify this to indicate that your stored procedure uses Scala 2.12.

PACKAGES = ( 'package_name' )

In this list, include the package for the version of the Snowpark library that you want to use.

Specify the fully qualified package name for the Snowpark library in the following format:

com.snowflake:snowpark:<version>

For example:

  • To use the latest version of Snowpark, use:

    PACKAGES = ('com.snowflake:snowpark:latest')
    
  • To use the 1.4.0 version of Snowpark, use:

    PACKAGES = ('com.snowflake:snowpark:1.4.0')

Note

When creating a new stored procedure, specify the version 1.1.0 or later.

Snowflake supports using Snowpark version 0.9.0 or later in a stored procedure. Note, however, that these versions have limitations. For example, versions prior to 1.1.0 do not support the use of transactions in a stored procedure.

For the list of supported packages and versions, query the INFORMATION_SCHEMA.PACKAGES view for rows with LANGUAGE = 'scala'. For example:

select * from information_schema.packages where language = 'scala';

IMPORTS = ( 'file' [, 'file' ... ] )

If your stored procedure depends on any JAR files that you uploaded to a stage location, include those files in this list.

If you are writing an in-line stored procedure, you can omit this clause, unless your code depends on classes defined outside the stored procedure or resource files.

If you are writing a pre-compiled stored procedure, you must also include the JAR file containing the definition of the stored procedure.

HANDLER = 'method_name'

Set this to the fully qualified name of your Scala method or function.

TARGET_PATH = 'jar_file'

If you are writing an inline stored procedure and you do not want Snowflake to recompile your code on every call, you can set this to the JAR file that Snowflake should create for your compiled code.

This must be a path on a stage for which you have WRITE privileges.

EXECUTE AS CALLER

If you planned to set up the stored procedure to use caller’s rights, add this parameter.

Otherwise, if you want to use owner’s rights, omit this parameter.

The following examples create stored procedures in Scala.

Example 1: In-line stored procedure:

CREATE OR REPLACE PROCEDURE myProc(fromTable STRING, toTable STRING, count INT)
RETURNS STRING
LANGUAGE SCALA
RUNTIME_VERSION = '2.12'
PACKAGES = ('com.snowflake:snowpark:latest')
HANDLER = 'MyObject.myProcedure'
AS
$$
  object MyObject
  {
    def myProcedure(session: com.snowflake.snowpark.Session, fromTable: String, toTable: String, count: Int): String =
    {
      session.table(fromTable).limit(count).write.saveAsTable(toTable)
      return "Success"
    }
  }
$$;

Example 2: Pre-compiled stored procedure that uses compiled code in the JAR file myjar.jar on the internal stage mystage:

CREATE OR REPLACE PROCEDURE myProc(fromTable STRING, toTable STRING, count INT)
RETURNS STRING
LANGUAGE SCALA
RUNTIME_VERSION = '2.12'
PACKAGES = ('com.snowflake:snowpark:latest')
IMPORTS = ('@mystage/myjar.jar')
HANDLER = 'MyObject.myProcedure';

Calling Your Stored Procedure

In order for a user to call a stored procedure, the user’s role must have the USAGE privilege for the stored procedure.

Once you have the privileges to call the stored procedure, you can use the CALL statement to call the stored procedure. For example:

CALL myProc('table_a', 'table_b', 5);

Appendix: Mapping Scala Data Types to Snowflake Data Types

For the arguments and return values of UDFs, Snowflake supports the following Scala data types in addition to the Java types listed in SQL-Java Data Type Mappings for Parameters and Return Types:

SQL Data Type

Scala Data Type

Notes

NUMBER

The following types are supported:

  • Int or Option[Int]

  • Long or Option[Long]

FLOAT

Float or Option[Float]

DOUBLE

Double or Option[Double]

VARCHAR

String

BOOLEAN

Boolean or Option[Boolean]

BINARY

Array[Byte]

VARIANT

String

Formats the value depending on the type that is represented. Variant null is formatted as the string “null”.

ARRAY

Array[String]

OBJECT

Map[String, String]

For DATE and TIMESTAMP, use the Java types listed in SQL-Java Data Type Mappings for Parameters and Return Types.

Back to top