Snowpark Connect for Spark package reference¶
This page documents the public Python API of the snowpark-connect package. It covers server
startup, session initialization, and Snowflake-specific session helpers.
Install the package from PyPI:
Package exports¶
The snowflake.snowpark_connect package exports the following symbols:
Export |
Description |
|---|---|
|
Start the server and return a ready-to-use |
|
Start the gRPC server without creating a client session. |
|
Return a |
|
Skip the automatic |
Server lifecycle¶
init_spark_session¶
Initialize and return a SparkSession connected to Snowflake. This is the most common entry
point. It starts the Snowpark Connect for Spark server (if it isn’t already running) and returns a ready-to-use
session.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Optional Spark configuration object. |
|
|
|
Connection parameters for the Snowpark session (for example, |
|
|
|
Application name for the Snowflake session. If not provided, a default is derived from the caller’s filename and a timestamp. |
Returns: SparkSession connected to Snowflake.
start_session¶
Start the Snowpark Connect for Spark gRPC server. This is a no-op if the server is already running. Use this when you need to control the server lifecycle separately from session creation, for example when running a long-lived server process for Scala clients.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
When |
|
|
|
A |
|
|
|
TCP port for the gRPC server. Mutually exclusive with |
|
|
|
Path to a Unix domain socket for the gRPC server. Mutually exclusive with |
|
|
|
When |
|
|
|
An existing Snowpark session to reuse (for example, from a stored procedure environment).
Can’t be used together with |
|
|
|
Connection parameters for creating a Snowpark session. Can’t be used together with
|
|
|
|
Maximum gRPC message size in bytes (default 128 MiB). |
|
|
|
Application name registered with the Snowflake session. |
Returns: threading.Thread when is_daemon=True, or None when is_daemon=False
(blocks until the server stops).
get_session¶
Return a SparkSession connected to a running Snowpark Connect for Spark server. The server must already be
started via start_session or init_spark_session.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Spark Connect server URL. Uses the default server URL if not provided. |
|
|
|
Optional Spark configuration object. |
Returns: SparkSession
Raises: RuntimeError if the server hasn’t been started.
execute_jar¶
Run a Java or Scala JAR inside the Snowpark Connect for Spark server process. This function manages the full lifecycle: it sets up the classpath, starts the server and JVM, executes the JAR’s main class, and shuts everything down when the application finishes.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
(required) |
Path to the application JAR file. |
|
|
(required) |
Fully qualified class name (for example, |
|
|
|
Arguments forwarded to the application’s |
|
|
|
Dependency JARs or globs added to the classpath (for example,
|
|
|
|
gRPC server port (defaults to |
|
|
|
JVM flags (for example, |
Note
execute_jar isn’t exported from the top-level package. Import it directly from
snowflake.snowpark_connect.server.
skip_session_configuration¶
Control whether Snowpark Connect for Spark runs ALTER SESSION SET for its standard parameter bundle at
startup. When set to True, you’re responsible for setting the required session parameters
manually. This is useful in restricted environments such as some Native App stored procedures.
SnowflakeSession¶
The SnowflakeSession class wraps a SparkSession to provide Snowflake SQL pass-through and
helper methods for switching database, schema, role, and warehouse. Use it when you need to run
Snowflake-specific SQL that Spark’s parser doesn’t support.
Constructor¶
Parameter |
Type |
Description |
|---|---|---|
|
|
The Spark Connect session to wrap. |
sql¶
Execute Snowflake-specific SQL directly against Snowflake. This bypasses the Spark SQL parser and sends the statement directly to Snowflake, allowing Snowflake-specific syntax that Spark doesn’t support.
Parameter |
Type |
Description |
|---|---|---|
|
|
The Snowflake SQL statement to execute. |
Returns: pyspark.sql.DataFrame
use_database¶
Switch the active database for the Snowflake session.
use_schema¶
Switch the active schema for the Snowflake session.
use_role¶
Switch the active role for the Snowflake session.
use_warehouse¶
Switch the active warehouse for the Snowflake session.
All four use_* methods accept a preserve_case parameter. When set to True, the
identifier is wrapped in double quotes to preserve its original casing. By default, Snowflake
uppercases unquoted identifiers.
Examples¶
Minimal setup¶
If your ~/.snowflake/connections.toml has a default connection configured, no parameters
are needed:
Connecting with explicit credentials¶
Connecting with a named connection¶
Setting an application name¶
The app name is registered as a query tag in Snowflake
(Spark-Connect-App-Name=my-etl-pipeline), making it easy to identify queries in the query
history.
Passing Spark configuration¶
Separate server and session lifecycle¶
Use this pattern when you want the server to outlive individual sessions, or when multiple sessions share the same server:
Long-running standalone server¶
This blocks the calling thread and is useful for running the server as a service:
Graceful shutdown with stop_event¶
Using an existing Snowpark session¶
In stored procedure environments where a Snowpark session is already available:
Snowflake SQL pass-through¶
Execute Snowflake-specific SQL that Spark’s parser doesn’t support:
Switching database, schema, role, and warehouse¶
Preserving case in identifiers¶
Snowflake uppercases unquoted identifiers by default. Use preserve_case=True to wrap names in
double quotes:
Running a Java or Scala JAR¶
Skipping automatic session configuration¶
Use this when you need full control over the Snowflake session parameters: