Snowpark Connect for Spark Java/Scala client reference¶
This page documents the public Java API of the Snowpark Connect for Spark Java/Scala client library. It covers session creation, Snowflake SQL pass-through, and environment configuration.
All classes are in the com.snowflake.snowpark_connect.client package.
Get the library from Maven Central:
JVM module system arguments¶
On Java 9 and later, the module system restricts reflective access to internal APIs that Apache Arrow (used by Spark Connect) requires. Add the following JVM arguments when running your application:
Package exports¶
The com.snowflake.snowpark_connect.client package exports the following public classes:
Export |
Description |
|---|---|
|
Entry point for building a |
|
Wraps a |
|
Runtime exception thrown when the server can’t be started or communicated with. |
SnowparkConnectSession¶
Entry point for building a SparkSession connected to a Snowpark Connect for Spark server. This is the most
common entry point for Java and Scala applications.
builder¶
Returns a new SnowparkConnectSessionBuilder for configuring the session.
Returns: SnowparkConnectSessionBuilder
getOrCreate¶
Returns the singleton SparkSession with default configuration. Equivalent to
SnowparkConnectSession.builder().getOrCreate().
Returns: SparkSession
Throws: SnowparkConnectServerException if the server can’t be started (local mode only).
SnowparkConnectSessionBuilder¶
Builder that produces a SparkSession connected to a Snowpark Connect for Spark server. Handles environment
detection, automatic server creation, and Spark config forwarding.
Resolution order for the server URL:
SPARK_REMOTEenvironment variable is set: connect directly to that URL.SNOWPARK_SUBMIT_JOB=true: connect to the sidecar server atsc://localhost:15002.Auto mode: launch a Python server subprocess using the configured venv.
appName¶
Set the application name, forwarded to SparkSession.Builder.appName(). The app name is
registered as a query tag in Snowflake (Spark-Connect-App-Name=<name>), making it easy to
identify queries in the query history.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Application name for the Snowflake session. |
config¶
Set Spark configuration properties, forwarded to SparkSession.Builder.config().
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
(required) |
Spark configuration property name. |
|
|
(required) |
Spark configuration property value. |
|
|
(required) |
Multiple configuration properties at once. Both Java and Scala map types are accepted. |
pythonVenv¶
Set the path to the Python virtual environment containing the snowpark-connect package. This
is forwarded to the server builder if no explicit server is provided and SPARK_REMOTE is not
set.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Path to a Python virtual environment. If not set, falls back to the
|
getOrCreate¶
Returns a SparkSession connected to a Snowpark Connect for Spark server. Delegates to the OSS
SparkSession.Builder.getOrCreate() which caches sessions by connection configuration.
Returns: SparkSession
Throws: SnowparkConnectServerException if the server can’t be started (local mode only).
SnowflakeSession¶
The SnowflakeSession class wraps a SparkSession to provide Snowflake SQL pass-through and
helper methods for switching database, schema, role, and warehouse. Use it when you need to run
Snowflake-specific SQL that Spark’s parser doesn’t support.
Constructor¶
Parameter |
Type |
Description |
|---|---|---|
|
|
The Spark Connect session to wrap. Must not be null. |
sql¶
Execute Snowflake-specific SQL directly against Snowflake. This bypasses the Spark SQL parser and sends the statement directly to Snowflake, allowing Snowflake-specific syntax that Spark doesn’t support.
Parameter |
Type |
Description |
|---|---|---|
|
|
The Snowflake SQL statement to execute. Must not be null. |
Returns: Dataset<Row>
useDatabase¶
Switch the active database for the Snowflake session.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
(required) |
The database name. |
|
|
|
If |
useSchema¶
Switch the active schema for the Snowflake session.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
(required) |
The schema name. |
|
|
|
If |
useRole¶
Switch the active role for the Snowflake session.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
(required) |
The role name. |
|
|
|
If |
useWarehouse¶
Switch the active warehouse for the Snowflake session.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
(required) |
The warehouse name. |
|
|
|
If |
All four use* methods pass identifiers unquoted by default, so Snowflake uppercases them. Set
preserveCase to true to wrap the identifier in double quotes and preserve its original
casing.
SnowparkConnectServerException¶
Runtime exception thrown when the Snowpark Connect for Spark server can’t be started, configured, or communicated with.
How it works¶
The library detects the execution environment at startup:
If the
SPARK_REMOTEenvironment variable is set, the library connects directly to the pre-existing server. No Python venv is needed.Otherwise, it resolves the Python venv, locates
start_server.pyfrom the installedsnowpark-connectpackage, starts a server subprocess on a random free port, and connects to it.
Each JVM gets its own server on its own port, so multiple IDE windows work simultaneously without port collisions. The server process is cleaned up automatically through a JVM shutdown hook.
The same application code runs unchanged whether launched from a local IDE (where the library
starts the Python server automatically) or inside a snowpark-submit job (where the server
already exists in a sibling container).
Environment variables¶
Only SNOWPARK_CONNECT_PYTHON_VENV can be overridden by the public code API.
Variable |
Description |
Default |
|---|---|---|
|
Path to a Python virtual environment containing the |
(none) |
|
Set to |
|
|
Server startup timeout in seconds. |
|
|
Spark Connect URL of a pre-existing server. When set, the library connects directly and
doesn’t start a subprocess. Set automatically by |
(none) |
Examples¶
Minimal setup¶
If your ~/.snowflake/connections.toml has a default connection configured, no parameters
are needed:
Specifying a Python venv¶
Alternatively, set SNOWPARK_CONNECT_PYTHON_VENV=/path/to/scos-venv and omit the
.pythonVenv() call. This keeps your code environment-independent.
Passing Spark configuration¶
Switching database, schema, role, and warehouse¶
Preserving case in identifiers¶
Snowflake uppercases unquoted identifiers by default. Use preserveCase to wrap names in double
quotes:
Default session¶
The simplest way to get a session with default configuration: