Setting Up a Jupyter Notebook for Snowpark

This topic explains how to set up a Jupyter notebook for Snowpark.

In this Topic:

Setting Up a Jupyter Notebook for Scala Development

Make sure that Jupyter is set up to use Scala. For example, you can install the Almond kernel.

Note

When using coursier to install the Almond kernel, specify a supported version of Scala.

Configuring the Jupyter Notebook for Snowpark

Next, configure the Jupyter notebook for Snowpark.

  1. In a new cell, run the following commands to define a variable for a directory:

    import sys.process._
    val replClassPath = "<path_to_a_new_directory>" // e.g. /home/myusername/replClasses
    s"mkdir -p $replClassPath" !
    

    This does the following:

    • Defines a variable for a directory for classes generated by the Scala REPL.

    • Creates that directory.

    Note

    Make sure that you have the operating system permissions to create a directory in that location.

    The Scala REPL generates classes for the Scala code that you write, including your code that defines UDFs. The Snowpark library uses this directory to find and upload the classes for your UDFs that are generated by the REPL.

    Note

    If you are using multiple notebooks, you’ll need to create and configure a separate REPL class directory for each notebook.

  2. Run the following commands in a cell to configure the compiler for the Scala REPL:

    interp.configureCompiler(_.settings.outputDirs.setSingleOutput(replClassPath))
    interp.configureCompiler(_.settings.Yreplclassbased)
    interp.load.cp(os.Path(replClassPath))
    

    This does the following:

    • Configures the compiler to generate classes for the REPL in the directory that you created earlier.

    • Configures the compiler to wrap code entered in the REPL in classes, rather than in objects.

    • Adds the directory that you created earlier as a dependency of the REPL interpreter.

  3. Create a new session in Snowpark, and add the REPL class directory that you created earlier as a dependency. For example:

    // Import the Snowpark library from Maven.
    import $ivy.`com.snowflake:snowpark:0.9.0`
    
    import com.snowflake.snowpark._
    import com.snowflake.snowpark.functions._
    
    val session = Session.builder.configs(Map(
        "URL" -> "https://<account_identifier>.snowflakecomputing.com",
        "USER" -> "<username>",
        "PASSWORD" -> "<password>",
        "ROLE" -> "<role_name>",
        "WAREHOUSE" -> "<warehouse_name>",
        "DB" -> "<database_name>",
        "SCHEMA" -> "<schema_name>"
    )).create
    
    // Add the directory for REPL classes that you created earlier.
    session.addDependency(replClassPath)

    See Creating a Session for Snowpark for an explanation of the Map keys.

  4. Run the following commands in a cell to add the Ammonite kernel classes as dependencies for your UDF:

    def addClass(session: Session, className: String): String = {
      var cls1 = Class.forName(className)
      val resourceName = "/" + cls1.getName().replace(".", "/") + ".class"
      val url = cls1.getResource(resourceName)
      val path = url.getPath().split(":").last.split("!").head
      session.addDependency(path)
      path
    }
    addClass(session, "ammonite.repl.ReplBridge$")
    addClass(session, "ammonite.interp.api.APIHolder")
    addClass(session, "pprint.TPrintColors")
    

    Note

    If you plan to create UDFs that have dependencies that are available through Maven, you can use the addClass method defined above to add those dependencies:

    addClass(session, "<dependency_package>.<dependency_class>")
    

    If you need to specify a dependency in a JAR file, call interp.load.cp to load the JAR file for the REPL interpreter, and call session.addDependency to add the JAR file as a dependency for your UDFs:

    interp.load.cp(os.Path(<path to jar file>/<jar file>)
    addDependency(<path to jar file>/<jar file>)
    

Verifying Your Jupyter Notebook Configuration

Run the following commands in a cell to verify that you can define and call an anonymous user-defined function (UDF):

class UDFCode extends Serializable {
  val appendLastNameFunc = (s: String) => {
    s"$s Johnson"
  }
}
// Define an anonymous UDF.
val appendLastNameUdf = udf((new UDFCode).appendLastNameFunc)
// Create a DataFrame that has a column NAME with a single row with the value "Raymond".
val df = session.sql("select 'Raymond' NAME")
// Call the UDF, passing in the values in the NAME column.
// Return a new DataFrame that has an additional column "Full Name" that contains the value returned by the UDF.
df.withColumn("Full Name", appendLastNameUdf(col("NAME"))).show()