Setting Up a Jupyter Notebook for Snowpark Scala¶
This topic explains how to set up a Jupyter notebook for Snowpark.
Setting Up Jupyter Notebooks for Scala Development¶
Make sure that Jupyter is set up to use Scala. For example, you can install the Almond kernel.
Note
When using coursier
to install the Almond kernel, specify a
supported version of Scala.
Creating a New Notebook in a New Folder¶
The Snowpark library requires access to the directory that contains classes generated by the Scala REPL. If you are planning to use multiple notebooks, you must use a separate REPL class directory for each notebook.
To make it easier to set up a separate REPL class directory for each notebook, create a separate folder for each notebook:
In the Notebook Dashboard, click New » Folder to create a new folder for a notebook.
Select the checkbox next to the folder, click Rename, and assign a new name for the folder.
Click the link for the folder to navigate into the folder.
Click New » Scala to create a new notebook in that folder.
Configuring the Jupyter Notebook for Snowpark¶
Next, configure the Jupyter notebook for Snowpark.
In a new cell, run the following commands to define a variable for a directory:
val replClassPathObj = os.Path("replClasses", os.pwd) if (!os.exists(replClassPathObj)) os.makeDir(replClassPathObj) val replClassPath = replClassPathObj.toString()
This does the following:
Defines a os.Path variable and a
String
variable for a directory for classes generated by the Scala REPL.Creates that directory, if that directory does not already exist.
The Scala REPL generates classes for the Scala code that you write, including your code that defines UDFs. The Snowpark library uses this directory to find and upload the classes for your UDFs that are generated by the REPL.
Note
If you are using multiple notebooks, you’ll need to create and configure a separate REPL class directory for each notebook. For simplicity, you can just put each notebook in a separate folder, as explained in Creating a New Notebook in a New Folder.
Run the following commands in a cell to configure the compiler for the Scala REPL:
interp.configureCompiler(_.settings.outputDirs.setSingleOutput(replClassPath)) interp.configureCompiler(_.settings.Yreplclassbased) interp.load.cp(replClassPathObj)
This does the following:
Configures the compiler to generate classes for the REPL in the directory that you created earlier.
Configures the compiler to wrap code entered in the REPL in classes, rather than in objects.
Adds the directory that you created earlier as a dependency of the REPL interpreter.
Create a new session in Snowpark, and add the REPL class directory that you created earlier as a dependency. For example:
// Import the Snowpark library from Maven. import $ivy.`com.snowflake:snowpark:1.15.0` import com.snowflake.snowpark._ import com.snowflake.snowpark.functions._ val session = Session.builder.configs(Map( "URL" -> "https://<account_identifier>.snowflakecomputing.com", "USER" -> "<username>", "PASSWORD" -> "<password>", "ROLE" -> "<role_name>", "WAREHOUSE" -> "<warehouse_name>", "DB" -> "<database_name>", "SCHEMA" -> "<schema_name>" )).create // Add the directory for REPL classes that you created earlier. session.addDependency(replClassPath)
See Creating a Session for Snowpark Scala for an explanation of the
Map
keys.Run the following commands in a cell to add the Ammonite kernel classes as dependencies for your UDF:
def addClass(session: Session, className: String): String = { var cls1 = Class.forName(className) val resourceName = "/" + cls1.getName().replace(".", "/") + ".class" val url = cls1.getResource(resourceName) val path = url.getPath().split(":").last.split("!").head session.addDependency(path) path } addClass(session, "ammonite.repl.ReplBridge$") addClass(session, "ammonite.interp.api.APIHolder") addClass(session, "pprint.TPrintColors")
Note
If you plan to create UDFs that have dependencies that are available through Maven, you can use the
addClass
method defined above to add those dependencies:addClass(session, "<dependency_package>.<dependency_class>")
If you need to specify a dependency in a JAR file, call
interp.load.cp
to load the JAR file for the REPL interpreter, and callsession.addDependency
to add the JAR file as a dependency for your UDFs:interp.load.cp(os.Path(<path to jar file>/<jar file>)) addDependency(<path to jar file>/<jar file>)
Verifying Your Jupyter Notebook Configuration¶
Run the following commands in a cell to verify that you can define and call an anonymous user-defined function (UDF):
class UDFCode extends Serializable {
val appendLastNameFunc = (s: String) => {
s"$s Johnson"
}
}
// Define an anonymous UDF.
val appendLastNameUdf = udf((new UDFCode).appendLastNameFunc)
// Create a DataFrame that has a column NAME with a single row with the value "Raymond".
val df = session.sql("select 'Raymond' NAME")
// Call the UDF, passing in the values in the NAME column.
// Return a new DataFrame that has an additional column "Full Name" that contains the value returned by the UDF.
df.withColumn("Full Name", appendLastNameUdf(col("NAME"))).show()
Troubleshooting¶
value res<n> is not a member of ammonite.$sess.cmd<n>.wrapper.Helper¶
If the following error occurs:
value res<n> is not a member of ammonite.$sess.cmd<n>.wrapper.Helper
Delete the contents of the directory containing the REPL classes (the directory with the path specified by the replClassPath
variable), and restart the notebook server.