Snowpark Connect for Spark properties¶
Snowpark Connect for Spark supports custom configuration in a way that’s similar to standard Spark. You can modify configuration properties only
through the session’s set method by using a key-value pair. Note that Snowpark Connect for Spark recognizes only a limited set of properties that
influence execution. Any unsupported properties are silently ignored without raising an exception.
Supported Spark properties¶
Snowpark Connect for Spark supports a subset of Spark properties.
Property Name |
Default |
Meaning |
Since |
|---|---|---|---|
|
(none) |
Application name set as Snowflake |
1.0.0 |
|
|
When |
1.0.0 |
|
(none) |
AWS access key ID for S3 authentication when reading or writing to S3 locations. |
1.0.0 |
|
(none) |
AWS IAM role ARN with S3 access when using role-based authentication. |
1.0.0 |
|
(none) |
AWS secret access key for S3 authentication when reading or writing to S3 locations. |
1.0.0 |
|
(none) |
AWS KMS key ID for server-side encryption when using the |
1.0.0 |
|
(none) |
AWS session token for temporary S3 credentials when using STS. |
1.0.0 |
|
|
Enables ANSI SQL mode for stricter type checking and error handling. When |
1.0.0 |
|
|
Controls case sensitivity for identifiers. When |
1.0.0 |
|
|
Enables or disables implicit cross joins. A |
1.0.0 |
|
|
When |
1.0.0 |
|
|
Schema name for global temporary views; created automatically if it does not exist. |
1.0.0 |
|
|
When |
1.0.0 |
|
|
Legacy behavior for dataset grouping key naming. |
1.6.0 |
|
|
Controls behavior when duplicate keys are found in map creation. Values: |
1.0.0 |
|
|
When |
1.0.0 |
|
|
Controls Parquet output timestamp type. Supports |
1.7.0 |
|
|
When |
1.0.0 |
|
|
When |
1.0.0 |
|
|
When |
1.0.0 |
|
|
Maximum number of rows to display in REPL eager evaluation mode. |
1.0.0 |
|
|
Maximum width for column values in REPL eager evaluation display before truncation. |
1.0.0 |
|
|
Byte threshold for caching local relations. Relations larger than this are cached to improve performance. |
1.0.0 |
|
|
Session timezone used for timestamp operations. Synced with Snowflake session via |
1.0.0 |
|
|
Default data source format for read/write operations when format is not explicitly specified. |
1.0.0 |
|
|
Default timestamp type for timestamp operations. Values: |
1.0.0 |
|
|
When |
1.0.0 |
Supported Snowpark Connect for Spark properties¶
Custom configuration properties specific to Snowpark Connect for Spark.
Property Name |
Default |
Meaning |
Since |
|---|---|---|---|
|
(none) |
Azure SAS token for Blob Storage authentication. Used when reading or writing to Azure Blob Storage locations. |
1.0.0 |
|
(none) |
Azure SAS token for ADLS Gen2 (Data Lake Storage) authentication. Used when reading or writing to Azure Data Lake Storage Gen2 locations. |
1.0.0 |
|
|
When |
1.0.0 |
|
|
Alternative config for generating Parquet summary metadata files. Either this or |
1.4.0 |
|
|
When |
1.0.0 |
|
|
When |
1.0.0 |
|
|
Time-to-live in seconds for query cache entries. Reduces repeated schema lookups. |
1.0.0 |
|
|
When |
1.0.0 |
|
|
When |
1.7.0 |
|
(none) |
Snowflake external volume name for Iceberg table operations. |
1.0.0 |
|
|
Controls conversion of decimal to integral types. Values: |
1.7.0 |
|
|
Controls the Scala version used (supports |
1.7.0 |
|
(none) |
External table location path for partitioned writes. |
1.4.0 |
|
|
When |
1.0.1 |
|
(none) |
Comma-separated list of files or modules to import for UDF execution. Triggers UDF recreation when changed. |
1.0.0 |
|
(none) |
Comma-separated list of files/modules to import for python UDF execution. Triggers UDF recreation when changed. |
1.7.0 |
|
(none) |
Comma-separated list of files or modules to import for Java UDF execution. Triggers UDF recreation when changed. |
1.7.0 |
|
(none) |
Comma-separated list of Python packages to include when registering UDFs. |
1.0.0 |
|
|
When |
1.0.0 |
|
|
Read-only. Returns the current Snowpark Connect for Spark version. |
1.0.0 |
|
|
How to handle duplicate column names in views. Values: |
1.0.0 |
|
|
When |
1.4.0 |
fs.azure.sas.<container>.<account>.blob.core.windows.net¶
Specifies the Azure SAS token for Blob Storage authentication. Used when reading or writing to Azure Blob Storage locations.
Default: (none)
Since: 1.0.0
fs.azure.sas.fixed.token.<account>.dfs.core.windows.net¶
Specifies the Azure SAS token for ADLS Gen2 (Data Lake Storage) authentication. Used when reading or writing to Azure Data Lake Storage Gen2 locations.
Default: (none)
Since: 1.0.0
mapreduce.fileoutputcommitter.marksuccessfuljobs¶
Specify true to generate _SUCCESS file after successful write operations for compatibility with Hadoop or Spark workflows.
Default: false
Since: 1.0.0
parquet.enable.summary-metadata¶
Specifies the alternative configuration for generating Parquet summary metadata files. Enables that feature with this property or spark.sql.parquet.enable.summary-metadata.
Default: false
Since: 1.4.0
snowflake.repartition.for.writes¶
Specify true to force DataFrame.repartition(n) to split output into n files during writes. Matches Spark behavior but adds overhead.
Default: false
Since: 1.0.0
snowpark.connect.cte.optimization_enabled ¶
Specify true to enable Common Table Expression (CTE) optimization in the Snowpark session for query performance.
Default: false
Since: 1.0.0
snowpark.connect.describe_cache_ttl_seconds ¶
Specifies the time to live, in seconds, for query cache entries. Reduces repeated schema lookups.
Default: 300
Since: 1.0.0
snowpark.connect.enable_snowflake_extension_behavior ¶
Specify true to enable Snowflake-specific extensions that can differ from Spark behavior (such as a hash on MAP types MD5 return type).
Default: false
Since: 1.0.0
Comments¶
When set to true, changes the behavior of certain operations:
bit_get/getbit— Explicit use of Snowflake getbit function
hash— Explicit use of Snowflake hash function
md5— Explicit use of Snowflake md5 functionRenaming table columns — Allows for altering table columns
snowpark.connect.handleIntegralOverflow¶
Specify true to align integral overflow behavior with the Spark approach.
Default: false
Since: 1.7.0
snowpark.connect.iceberg.external_volume ¶
Specifies the Snowflake external volume name for Iceberg table operations.
Default: (none)
Since: 1.0.0
snowpark.connect.integralTypesEmulation¶
Specifies how to convert decimal to integral types. Values: client_default, enabled, disabled
Default: client_default
Since: 1.7.0
Comments¶
By default, Snowpark Connect for Spark treats all integral types as Long types. This is caused by the way numbers are represented in Snowflake. Integral types emulation allows for an exact mapping between Snowpark and Spark types when reading from datasources.
The default option client_default activates the emulation only when the script is executed from the Scala client. Integral types are mapped based on the following precisions:
Precision |
Spark Type |
|---|---|
19 |
|
10 |
|
5 |
|
3 |
|
Other |
|
When other precisions are found, the final type is mapped to the DecimalType.
snowpark.connect.scala.version¶
Specifies the Scala version to use (supports 2.12 or 2.13).
Default: 2.12
Since: 1.7.0
snowpark.connect.sql.partition.external_table_location ¶
Specifies the external table location path for partitioned writes.
Default: (none)
Since: 1.4.0
Comments¶
To read only an exact subset of partitioned files from the provided directory, additional configuration is required. This feature is only available for files stored on external stages. To prune the read files, Snowpark Connect for Spark uses external tables.
This feature is enabled when the configuration snowpark.connect.sql.partition.external_table_location is set. It should contain existing database and schema names where external tables will be created.
Reading parquet files that are stored on external stages will create an external table; for files on internal stages, it will not be created. Providing the schema will reduce the execution time, eliminating the cost of inferring it from sources.
For best performance, filter according to the Snowflake External Tables filtering limitations.
Example¶
spark.conf.set("snowpark.connect.sql.partition.external_table_location", "<database-name>.<schema-name>")
spark.read.parquet("@external-stage/example").filter(col("x") > lit(1)).show()
schema = StructType([StructField("x",IntegerType()),StructField("y",DoubleType())])
spark.read.schema(schema).parquet("@external-stage/example").filter(col("x") > lit(1)).show()
snowpark.connect.temporary.views.create_in_snowflake ¶
Specify true to create temporary views directly in Snowflake instead of managing them locally.
Default: false
Since: 1.0.1
snowpark.connect.udf.imports [DEPRECATED 1.7.0]¶
Specifies a comma-separated list of files and modules to import for UDF execution. When this value is changed, it triggers UDF recreation.
Default: (none)
Since: 1.0.0
snowpark.connect.udf.python.imports¶
Specifies a comma-separated list of files and modules to import for python UDF execution. When this value is changed, it triggers UDF recreation.
Default: (none)
Since: 1.7.0
snowpark.connect.udf.java.imports¶
Specifies a comma-separated list of files and modules to import for Java UDF execution. Triggers UDF recreation when changed.
Default: (none)
Since: 1.7.0
Comments¶
This configuration works very similarly to the snowpark.connect.udf.python.imports. With it, you can specify external libraries and files for Java UDFs created using registerJavaFunction. Configurations are mutually exclusive to prevent unnecessary dependency mixing.
To include external libraries and files, you provide stage paths to the files as the value of the configuration setting snowpark.connect.udf.java.imports. The configuration value should be an array of stage paths to the files, where the paths are separated by commas.
Example¶
Code in the following example includes two files in the UDF’s execution context. The UDF imports functions from these files and uses them in its logic.
# Files need to be previously staged
spark.conf.set("snowpark.connect.udf.java.imports", "[@stage/library.jar]")
spark.registerJavaFunction("javaFunction", "com.example.ExampleFunction")
spark.sql("SELECT javaFunction('arg')").show()
You can use the snowpark.connect.udf.java.imports setting to include other kinds of files as well, such as those with data your code needs to read. Note that when you do this, your code should only read from the included files; any writes to such files will be lost after the function’s execution ends.
snowpark.connect.udf.packages¶
Specifies a comma-separated list of Python packages to include when registering UDFs.
Default: (none)
Since: 1.0.0
Comments¶
You can use this to define additional packages to be available in Python UDFs. The value is a comma-separated list of dependencies.
You can discover the list of supported packages by executing the following SQL in Snowflake:
SELECT * FROM INFORMATION_SCHEMA.PACKAGES WHERE LANGUAGE = 'python';
Example¶
spark.conf.set("snowpark.connect.udf.packages", "[numpy]")
@udtf(returnType="val: int")
class Powers:
def eval(self, x: int):
import numpy as np
for v in np.power(np.array([x, x, x]), [0, 1, 2]):
yield (int(v),)
spark.udtf.register(name="powers", f=Powers)
spark.sql("SELECT * FROM powers(10)").show()
For more information, see Python.
snowpark.connect.udtf.compatibility_mode ¶
Specify true to enables Spark-compatible UDTF behavior for improved compatibility with Spark UDTF semantics.
Default: false
Since: 1.0.0
Comments¶
This property determines whether UDTFs should use Spark-compatible behavior or the default Snowpark behavior. When set to true, it applies a compatibility wrapper that mimics Spark’s output type coercion and error handling patterns.
When enabled, UDTFs use a compatibility wrapper that applies Spark-style automatic type coercion (e.g., string “true” to boolean, boolean to integer) and error handling. The wrapper also converts table arguments to Row-like objects for both positional and named access, and properly handles SQL null values to match Spark’s behavior patterns.
snowpark.connect.version¶
Returns the current Snowpark Connect for Spark version. Read only.
Default: <current_version>
Since: 1.0.0
snowpark.connect.views.duplicate_column_names_handling_mode ¶
Specifies how to handle duplicate column names in views. Allowed values include rename (add suffix) fail (raise error) or drop (remove duplicates).
Default: rename
Since: 1.0.0
Comments¶
Snowflake does not support duplicate column names.
Example¶
The following code fails at the view creation step with the following SQL compilation error: “duplicate column name ‘foo’”.
df = spark.createDataFrame([
(1, 1),
(2, 2)
], ["foo", "foo"])
df.show() # works
df.createTempView("df_view") # Fails with SQL compilation error: duplicate column name 'foo'
To work around this, set the snowpark.connect.views.duplicate_column_names_handling_mode configuration option to one of the following values:
rename: A suffix such as_dedup_1,_dedup_2, and so on will be appended to all of the duplicate column names after the first one.drop: All of the duplicate columns except one will be dropped. If the columns have different values, this might lead to incorrect results.
snowpark.connect.udf.java.imports¶
Specifies a comma-separated list of files and modules to import for Java UDF execution. Triggers UDF recreation when changed.
Default: (none)
Since: 1.7.0
Comments¶
This configuration works very similarly to the snowpark.connect.udf.python.imports. You can use it to specify external libraries and files for Java UDFs created using registerJavaFunction. Configurations are mutually exclusive to prevent unnecessary dependency mixing.
To include external libraries and files, you provide stage paths to the files as the value of the configuration setting snowpark.connect.udf.java.imports. The value is an array of stage paths to the files, where the paths are separated by commas.
Example¶
Code in the following example includes two files in the UDF’s execution context. The UDF imports functions from these files and uses them in its logic.
# Files need to be previously staged
spark.conf.set("snowpark.connect.udf.java.imports", "[@stage/library.jar]")
spark.registerJavaFunction("javaFunction", "com.example.ExampleFunction")
spark.sql("SELECT javaFunction('arg')").show()
You can use the snowpark.connect.udf.java.imports setting to include other kinds of files as well, such as those with data your code needs to read. When you do this, your code should only read from the included files; any writes to such files will be lost after the function’s execution ends.
snowpark.connect.udf.packages¶
Specifies a comma-separated list of Python packages to include when registering UDFs.
Default: (none)
Since: 1.0.0
Comments¶
Configuration allows for defining additional packages available in Python UDFs. The value is a comma separated list of dependencies.
You can discover the list of supported packages by executing the following SQL in Snowflake:
SELECT * FROM INFORMATION_SCHEMA.PACKAGES WHERE LANGUAGE = 'python';
Example¶
spark.conf.set("snowpark.connect.udf.packages", "[numpy]")
@udtf(returnType="val: int")
class Powers:
def eval(self, x: int):
import numpy as np
for v in np.power(np.array([x, x, x]), [0, 1, 2]):
yield (int(v),)
spark.udtf.register(name="powers", f=Powers)
spark.sql("SELECT * FROM powers(10)").show()
Reference: Packages Reference
snowpark.connect.udtf.compatibility_mode ¶
Specify true to enable Spark-compatible UDTF behavior for improved compatibility with Spark UDTF semantics.
Default: false
Since: 1.0.0
Comments¶
This configuration determines whether UDTFs should use Spark-compatible behavior or the default Snowpark behavior. When enabled (true), it applies a compatibility wrapper that mimics Spark’s output type coercion and error handling patterns.
When enabled, UDTFs use a compatibility wrapper that applies Spark-style automatic type coercion (for example, string “true” to boolean, boolean to integer) and error handling. The wrapper also converts table arguments to Row-like objects for both positional and named access, and properly handles SQL null values to match Spark’s behavior patterns.
Comments¶
Configuration that enables Snowflake Common Table Expressions (CTEs). This configuration optimizes the Snowflake queries in which there are a lot of repetitive code blocks. This modification will lead to improvements in both query compilation and execution performance.