Setting Up a Jupyter Notebook for Snowpark Scala

This topic explains how to set up a Jupyter notebook for Snowpark.

Setting Up Jupyter Notebooks for Scala Development

Make sure that Jupyter is set up to use Scala. For example, you can install the Almond kernel.

Note

When using coursier to install the Almond kernel, specify a supported version of Scala.

Creating a New Notebook in a New Folder

The Snowpark library requires access to the directory that contains classes generated by the Scala REPL. If you are planning to use multiple notebooks, you must use a separate REPL class directory for each notebook.

To make it easier to set up a separate REPL class directory for each notebook, create a separate folder for each notebook:

  1. In the Notebook Dashboard, click New » Folder to create a new folder for a notebook.

  2. Select the checkbox next to the folder, click Rename, and assign a new name for the folder.

  3. Click the link for the folder to navigate into the folder.

  4. Click New » Scala to create a new notebook in that folder.

Configuring the Jupyter Notebook for Snowpark

Next, configure the Jupyter notebook for Snowpark.

  1. In a new cell, run the following commands to define a variable for a directory:

    val replClassPathObj = os.Path("replClasses", os.pwd)
    if (!os.exists(replClassPathObj)) os.makeDir(replClassPathObj)
    val replClassPath = replClassPathObj.toString()
    
    Copy

    This does the following:

    • Defines a os.Path variable and a String variable for a directory for classes generated by the Scala REPL.

    • Creates that directory, if that directory does not already exist.

    The Scala REPL generates classes for the Scala code that you write, including your code that defines UDFs. The Snowpark library uses this directory to find and upload the classes for your UDFs that are generated by the REPL.

    Note

    If you are using multiple notebooks, you’ll need to create and configure a separate REPL class directory for each notebook. For simplicity, you can just put each notebook in a separate folder, as explained in Creating a New Notebook in a New Folder.

  2. Run the following commands in a cell to configure the compiler for the Scala REPL:

    interp.configureCompiler(_.settings.outputDirs.setSingleOutput(replClassPath))
    interp.configureCompiler(_.settings.Yreplclassbased)
    interp.load.cp(replClassPathObj)
    
    Copy

    This does the following:

    • Configures the compiler to generate classes for the REPL in the directory that you created earlier.

    • Configures the compiler to wrap code entered in the REPL in classes, rather than in objects.

    • Adds the directory that you created earlier as a dependency of the REPL interpreter.

  3. Create a new session in Snowpark, and add the REPL class directory that you created earlier as a dependency. For example:

    // Import the Snowpark library from Maven.
    import $ivy.`com.snowflake:snowpark:1.10.0`
    
    import com.snowflake.snowpark._
    import com.snowflake.snowpark.functions._
    
    val session = Session.builder.configs(Map(
        "URL" -> "https://<account_identifier>.snowflakecomputing.com",
        "USER" -> "<username>",
        "PASSWORD" -> "<password>",
        "ROLE" -> "<role_name>",
        "WAREHOUSE" -> "<warehouse_name>",
        "DB" -> "<database_name>",
        "SCHEMA" -> "<schema_name>"
    )).create
    
    // Add the directory for REPL classes that you created earlier.
    session.addDependency(replClassPath)

    See Creating a Session for Snowpark Scala for an explanation of the Map keys.

  4. Run the following commands in a cell to add the Ammonite kernel classes as dependencies for your UDF:

    def addClass(session: Session, className: String): String = {
      var cls1 = Class.forName(className)
      val resourceName = "/" + cls1.getName().replace(".", "/") + ".class"
      val url = cls1.getResource(resourceName)
      val path = url.getPath().split(":").last.split("!").head
      session.addDependency(path)
      path
    }
    addClass(session, "ammonite.repl.ReplBridge$")
    addClass(session, "ammonite.interp.api.APIHolder")
    addClass(session, "pprint.TPrintColors")
    
    Copy

    Note

    If you plan to create UDFs that have dependencies that are available through Maven, you can use the addClass method defined above to add those dependencies:

    addClass(session, "<dependency_package>.<dependency_class>")
    
    Copy

    If you need to specify a dependency in a JAR file, call interp.load.cp to load the JAR file for the REPL interpreter, and call session.addDependency to add the JAR file as a dependency for your UDFs:

    interp.load.cp(os.Path(<path to jar file>/<jar file>))
    addDependency(<path to jar file>/<jar file>)
    
    Copy

Verifying Your Jupyter Notebook Configuration

Run the following commands in a cell to verify that you can define and call an anonymous user-defined function (UDF):

class UDFCode extends Serializable {
  val appendLastNameFunc = (s: String) => {
    s"$s Johnson"
  }
}
// Define an anonymous UDF.
val appendLastNameUdf = udf((new UDFCode).appendLastNameFunc)
// Create a DataFrame that has a column NAME with a single row with the value "Raymond".
val df = session.sql("select 'Raymond' NAME")
// Call the UDF, passing in the values in the NAME column.
// Return a new DataFrame that has an additional column "Full Name" that contains the value returned by the UDF.
df.withColumn("Full Name", appendLastNameUdf(col("NAME"))).show()
Copy

Troubleshooting

value res<n> is not a member of ammonite.$sess.cmd<n>.wrapper.Helper

If the following error occurs:

value res<n> is not a member of ammonite.$sess.cmd<n>.wrapper.Helper
Copy

Delete the contents of the directory containing the REPL classes (the directory with the path specified by the replClassPath variable), and restart the notebook server.