Snowpark Connect for Spark properties

Snowpark Connect for Spark supports custom configuration in a way that’s similar to standard Spark. You can modify configuration properties only through the session’s set method by using a key-value pair. Note that Snowpark Connect for Spark recognizes only a limited set of properties that influence execution. Any unsupported properties are silently ignored without raising an exception.

Supported Spark properties

Snowpark Connect for Spark supports a subset of Spark properties.

Property Name

Default

Meaning

Since

spark.app.name

(none)

Application name set as Snowflake query_tag (Spark-Connect-App-Name={name}) for query tracking.

1.0.0

spark.Catalog.databaseFilterInformationSchema

false

When true, filters out INFORMATION_SCHEMA from database listings in catalog operations.

1.0.0

spark.hadoop.fs.s3a.access.key

(none)

AWS access key ID for S3 authentication when reading or writing to S3 locations.

1.0.0

spark.hadoop.fs.s3a.assumed.role.arn

(none)

AWS IAM role ARN with S3 access when using role-based authentication.

1.0.0

spark.hadoop.fs.s3a.secret.key

(none)

AWS secret access key for S3 authentication when reading or writing to S3 locations.

1.0.0

spark.hadoop.fs.s3a.server-side-encryption.key

(none)

AWS KMS key ID for server-side encryption when using the AWS_SSE_KMS encryption type.

1.0.0

spark.hadoop.fs.s3a.session.token

(none)

AWS session token for temporary S3 credentials when using STS.

1.0.0

spark.sql.ansi.enabled

false

Enables ANSI SQL mode for stricter type checking and error handling. When true, arithmetic overflows and invalid casts raise errors instead of returning NULL.

1.0.0

spark.sql.caseSensitive

false

Controls case sensitivity for identifiers. When false, column and table names are case-insensitive (auto-uppercased in Snowflake).

1.0.0

spark.sql.crossJoin.enabled

true

Enables or disables implicit cross joins. A false and missing or trivial join condition will result in an error.

1.0.0

spark.sql.execution.pythonUDTF.arrow.enabled

false

When true, enables Apache Arrow optimization for Python UDTF serialization/deserialization.

1.0.0

spark.sql.globalTempDatabase

global_temp

Schema name for global temporary views; created automatically if it does not exist.

1.0.0

spark.sql.legacy.allowHashOnMapType

false

When true, allows hashing MAP type columns. By default, MAP types cannot be hashed for consistency with Spark behavior.

1.0.0

spark.sql.legacy.dataset.nameNonStructGroupingKeyAsValue

false

Legacy behavior for dataset grouping key naming.

1.6.0

spark.sql.mapKeyDedupPolicy

EXCEPTION

Controls behavior when duplicate keys are found in map creation. Values: EXCEPTION (raise error) or LAST_WIN (keep last value).

1.0.0

spark.sql.parser.quotedRegexColumnNames

false

When true, enables regex pattern matching in quoted column names in SQL queries (e.g. SELECT '(col1|col2)' FROM table).

1.0.0

spark.sql.parquet.outputTimestampType

TIMESTAMP_MILLIS

Controls Parquet output timestamp type. Supports TIMESTAMP_MILLIS or TIMESTAMP_MICROS.

1.7.0

spark.sql.pyspark.inferNestedDictAsStruct.enabled

false

When true, infers nested Python dictionaries as StructType instead of MapType during DataFrame creation.

1.0.0

spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled

false

When true, infers array element type from first element only instead of sampling all elements.

1.0.0

spark.sql.repl.eagerEval.enabled

false

When true, enables eager evaluation in REPL showing DataFrame results automatically without calling show().

1.0.0

spark.sql.repl.eagerEval.maxNumRows

20

Maximum number of rows to display in REPL eager evaluation mode.

1.0.0

spark.sql.repl.eagerEval.truncate

20

Maximum width for column values in REPL eager evaluation display before truncation.

1.0.0

spark.sql.session.localRelationCacheThreshold

2147483647

Byte threshold for caching local relations. Relations larger than this are cached to improve performance.

1.0.0

spark.sql.session.timeZone

<system_local_timezone>

Session timezone used for timestamp operations. Synced with Snowflake session via ALTER SESSION SET TIMEZONE.

1.0.0

spark.sql.sources.default

parquet

Default data source format for read/write operations when format is not explicitly specified.

1.0.0

spark.sql.timestampType

TIMESTAMP_LTZ

Default timestamp type for timestamp operations. Values: TIMESTAMP_LTZ (with local timezone) or TIMESTAMP_NTZ (no timezone).

1.0.0

spark.sql.tvf.allowMultipleTableArguments.enabled

true

When true, allows table-valued functions to accept multiple table arguments.

1.0.0

Supported Snowpark Connect for Spark properties

Custom configuration properties specific to Snowpark Connect for Spark.

Property Name

Default

Meaning

Since

fs.azure.sas.<container>.<account>.blob.core.windows.net

(none)

Azure SAS token for Blob Storage authentication. Used when reading or writing to Azure Blob Storage locations.

1.0.0

fs.azure.sas.fixed.token.<account>.dfs.core.windows.net

(none)

Azure SAS token for ADLS Gen2 (Data Lake Storage) authentication. Used when reading or writing to Azure Data Lake Storage Gen2 locations.

1.0.0

mapreduce.fileoutputcommitter.marksuccessfuljobs

false

When true, generates _SUCCESS file after successful write operations for compatibility with Hadoop/Spark workflows.

1.0.0

parquet.enable.summary-metadata

false

Alternative config for generating Parquet summary metadata files. Either this or spark.sql.parquet.enable.summary-metadata enables the feature.

1.4.0

snowflake.repartition.for.writes

false

When true, forces DataFrame.repartition(n) to split output into n files during writes. Matches Spark behavior but adds overhead.

1.0.0

snowpark.connect.cte.optimization_enabled

false

When true, enables Common Table Expression (CTE) optimization in Snowpark sessions for improved query performance.

1.0.0

snowpark.connect.describe_cache_ttl_seconds

300

Time-to-live in seconds for query cache entries. Reduces repeated schema lookups.

1.0.0

snowpark.connect.enable_snowflake_extension_behavior

false

When true, enables Snowflake-specific extensions that can differ from Spark behavior (such as hash on MAP types or MD5 return type).

1.0.0

snowpark.connect.handleIntegralOverflow

false

When true, integral overflow behavior is aligned with the Spark approach.

1.7.0

snowpark.connect.iceberg.external_volume

(none)

Snowflake external volume name for Iceberg table operations.

1.0.0

snowpark.connect.integralTypesEmulation

client_default

Controls conversion of decimal to integral types. Values: client_default, enabled, disabled

1.7.0

snowpark.connect.scala.version

2.12

Controls the Scala version used (supports 2.12 or 2.13)

1.7.0

snowpark.connect.sql.partition.external_table_location

(none)

External table location path for partitioned writes.

1.4.0

snowpark.connect.temporary.views.create_in_snowflake

false

When true, creates temporary views directly in Snowflake instead of managing them locally.

1.0.1

snowpark.connect.udf.imports [DEPRECATED 1.7.0]

(none)

Comma-separated list of files or modules to import for UDF execution. Triggers UDF recreation when changed.

1.0.0

snowpark.connect.udf.python.imports

(none)

Comma-separated list of files/modules to import for python UDF execution. Triggers UDF recreation when changed.

1.7.0

snowpark.connect.udf.java.imports

(none)

Comma-separated list of files or modules to import for Java UDF execution. Triggers UDF recreation when changed.

1.7.0

snowpark.connect.udf.packages

(none)

Comma-separated list of Python packages to include when registering UDFs.

1.0.0

snowpark.connect.udtf.compatibility_mode

false

When true, enables Spark-compatible UDTF behavior for improved compatibility with Spark UDTF semantics.

1.0.0

snowpark.connect.version

<current_version>

Read-only. Returns the current Snowpark Connect for Spark version.

1.0.0

snowpark.connect.views.duplicate_column_names_handling_mode

rename

How to handle duplicate column names in views. Values: rename (add suffix) fail (raise error) or drop (remove duplicates).

1.0.0

spark.sql.parquet.enable.summary-metadata

false

When true, generates Parquet summary metadata files (_metadata _common_metadata) during Parquet writes.

1.4.0

fs.azure.sas.<container>.<account>.blob.core.windows.net

Specifies the Azure SAS token for Blob Storage authentication. Used when reading or writing to Azure Blob Storage locations.

Default: (none)

Since: 1.0.0

fs.azure.sas.fixed.token.<account>.dfs.core.windows.net

Specifies the Azure SAS token for ADLS Gen2 (Data Lake Storage) authentication. Used when reading or writing to Azure Data Lake Storage Gen2 locations.

Default: (none)

Since: 1.0.0

mapreduce.fileoutputcommitter.marksuccessfuljobs

Specify true to generate _SUCCESS file after successful write operations for compatibility with Hadoop or Spark workflows.

Default: false

Since: 1.0.0

parquet.enable.summary-metadata

Specifies the alternative configuration for generating Parquet summary metadata files. Enables that feature with this property or spark.sql.parquet.enable.summary-metadata.

Default: false

Since: 1.4.0

snowflake.repartition.for.writes

Specify true to force DataFrame.repartition(n) to split output into n files during writes. Matches Spark behavior but adds overhead.

Default: false

Since: 1.0.0

snowpark.connect.cte.optimization_enabled

Specify true to enable Common Table Expression (CTE) optimization in the Snowpark session for query performance.

Default: false

Since: 1.0.0

Comments

Configuration that enables Snowflake Common Table Expressions (CTEs). This configuration optimizes the Snowflake queries in which there are a lot of repetitive code blocks. This modification will lead to improvements in both query compilation and execution performance.

snowpark.connect.describe_cache_ttl_seconds

Specifies the time to live, in seconds, for query cache entries. Reduces repeated schema lookups.

Default: 300

Since: 1.0.0

snowpark.connect.enable_snowflake_extension_behavior

Specify true to enable Snowflake-specific extensions that can differ from Spark behavior (such as a hash on MAP types MD5 return type).

Default: false

Since: 1.0.0

Comments

When set to true, changes the behavior of certain operations:

snowpark.connect.handleIntegralOverflow

Specify true to align integral overflow behavior with the Spark approach.

Default: false

Since: 1.7.0

snowpark.connect.iceberg.external_volume

Specifies the Snowflake external volume name for Iceberg table operations.

Default: (none)

Since: 1.0.0

snowpark.connect.integralTypesEmulation

Specifies how to convert decimal to integral types. Values: client_default, enabled, disabled

Default: client_default

Since: 1.7.0

Comments

By default, Snowpark Connect for Spark treats all integral types as Long types. This is caused by the way numbers are represented in Snowflake. Integral types emulation allows for an exact mapping between Snowpark and Spark types when reading from datasources.

The default option client_default activates the emulation only when the script is executed from the Scala client. Integral types are mapped based on the following precisions:

Precision

Spark Type

19

LongType

10

IntegerType

5

ShortType

3

ByteType

Other

DecimalType(precision, 0)

When other precisions are found, the final type is mapped to the DecimalType.

snowpark.connect.scala.version

Specifies the Scala version to use (supports 2.12 or 2.13).

Default: 2.12

Since: 1.7.0

snowpark.connect.sql.partition.external_table_location

Specifies the external table location path for partitioned writes.

Default: (none)

Since: 1.4.0

Comments

To read only an exact subset of partitioned files from the provided directory, additional configuration is required. This feature is only available for files stored on external stages. To prune the read files, Snowpark Connect for Spark uses external tables.

This feature is enabled when the configuration snowpark.connect.sql.partition.external_table_location is set. It should contain existing database and schema names where external tables will be created.

Reading parquet files that are stored on external stages will create an external table; for files on internal stages, it will not be created. Providing the schema will reduce the execution time, eliminating the cost of inferring it from sources.

For best performance, filter according to the Snowflake External Tables filtering limitations.

Example
spark.conf.set("snowpark.connect.sql.partition.external_table_location", "<database-name>.<schema-name>")

spark.read.parquet("@external-stage/example").filter(col("x") > lit(1)).show()

schema = StructType([StructField("x",IntegerType()),StructField("y",DoubleType())])

spark.read.schema(schema).parquet("@external-stage/example").filter(col("x") > lit(1)).show()
Copy

snowpark.connect.temporary.views.create_in_snowflake

Specify true to create temporary views directly in Snowflake instead of managing them locally.

Default: false

Since: 1.0.1

snowpark.connect.udf.imports [DEPRECATED 1.7.0]

Specifies a comma-separated list of files and modules to import for UDF execution. When this value is changed, it triggers UDF recreation.

Default: (none)

Since: 1.0.0

snowpark.connect.udf.python.imports

Specifies a comma-separated list of files and modules to import for python UDF execution. When this value is changed, it triggers UDF recreation.

Default: (none)

Since: 1.7.0

snowpark.connect.udf.java.imports

Specifies a comma-separated list of files and modules to import for Java UDF execution. Triggers UDF recreation when changed.

Default: (none)

Since: 1.7.0

Comments

This configuration works very similarly to the snowpark.connect.udf.python.imports. With it, you can specify external libraries and files for Java UDFs created using registerJavaFunction. Configurations are mutually exclusive to prevent unnecessary dependency mixing.

To include external libraries and files, you provide stage paths to the files as the value of the configuration setting snowpark.connect.udf.java.imports. The configuration value should be an array of stage paths to the files, where the paths are separated by commas.

Example

Code in the following example includes two files in the UDF’s execution context. The UDF imports functions from these files and uses them in its logic.

# Files need to be previously staged

spark.conf.set("snowpark.connect.udf.java.imports", "[@stage/library.jar]")

spark.registerJavaFunction("javaFunction", "com.example.ExampleFunction")

spark.sql("SELECT javaFunction('arg')").show()
Copy

You can use the snowpark.connect.udf.java.imports setting to include other kinds of files as well, such as those with data your code needs to read. Note that when you do this, your code should only read from the included files; any writes to such files will be lost after the function’s execution ends.

snowpark.connect.udf.packages

Specifies a comma-separated list of Python packages to include when registering UDFs.

Default: (none)

Since: 1.0.0

Comments

You can use this to define additional packages to be available in Python UDFs. The value is a comma-separated list of dependencies.

You can discover the list of supported packages by executing the following SQL in Snowflake:

SELECT * FROM INFORMATION_SCHEMA.PACKAGES WHERE LANGUAGE = 'python';
Copy
Example
spark.conf.set("snowpark.connect.udf.packages", "[numpy]")

@udtf(returnType="val: int")

class Powers:

  def eval(self, x: int):
      import numpy as np

      for v in np.power(np.array([x, x, x]), [0, 1, 2]):
          yield (int(v),)

spark.udtf.register(name="powers", f=Powers)

spark.sql("SELECT * FROM powers(10)").show()
Copy

For more information, see Python.

snowpark.connect.udtf.compatibility_mode

Specify true to enables Spark-compatible UDTF behavior for improved compatibility with Spark UDTF semantics.

Default: false

Since: 1.0.0

Comments

This property determines whether UDTFs should use Spark-compatible behavior or the default Snowpark behavior. When set to true, it applies a compatibility wrapper that mimics Spark’s output type coercion and error handling patterns.

When enabled, UDTFs use a compatibility wrapper that applies Spark-style automatic type coercion (e.g., string “true” to boolean, boolean to integer) and error handling. The wrapper also converts table arguments to Row-like objects for both positional and named access, and properly handles SQL null values to match Spark’s behavior patterns.

snowpark.connect.version

Returns the current Snowpark Connect for Spark version. Read only.

Default: <current_version>

Since: 1.0.0

snowpark.connect.views.duplicate_column_names_handling_mode

Specifies how to handle duplicate column names in views. Allowed values include rename (add suffix) fail (raise error) or drop (remove duplicates).

Default: rename

Since: 1.0.0

Comments

Snowflake does not support duplicate column names.

Example

The following code fails at the view creation step with the following SQL compilation error: “duplicate column name ‘foo’”.

df = spark.createDataFrame([
(1, 1),
(2, 2)
], ["foo", "foo"])

df.show() # works

df.createTempView("df_view") # Fails with SQL compilation error: duplicate column name 'foo'
Copy

To work around this, set the snowpark.connect.views.duplicate_column_names_handling_mode configuration option to one of the following values:

  • rename: A suffix such as _dedup_1, _dedup_2, and so on will be appended to all of the duplicate column names after the first one.

  • drop: All of the duplicate columns except one will be dropped. If the columns have different values, this might lead to incorrect results.

snowpark.connect.udf.java.imports

Specifies a comma-separated list of files and modules to import for Java UDF execution. Triggers UDF recreation when changed.

Default: (none)

Since: 1.7.0

Comments

This configuration works very similarly to the snowpark.connect.udf.python.imports. You can use it to specify external libraries and files for Java UDFs created using registerJavaFunction. Configurations are mutually exclusive to prevent unnecessary dependency mixing.

To include external libraries and files, you provide stage paths to the files as the value of the configuration setting snowpark.connect.udf.java.imports. The value is an array of stage paths to the files, where the paths are separated by commas.

Example

Code in the following example includes two files in the UDF’s execution context. The UDF imports functions from these files and uses them in its logic.

# Files need to be previously staged

spark.conf.set("snowpark.connect.udf.java.imports", "[@stage/library.jar]")

spark.registerJavaFunction("javaFunction", "com.example.ExampleFunction")

spark.sql("SELECT javaFunction('arg')").show()
Copy

You can use the snowpark.connect.udf.java.imports setting to include other kinds of files as well, such as those with data your code needs to read. When you do this, your code should only read from the included files; any writes to such files will be lost after the function’s execution ends.

snowpark.connect.udf.packages

Specifies a comma-separated list of Python packages to include when registering UDFs.

Default: (none)

Since: 1.0.0

Comments

Configuration allows for defining additional packages available in Python UDFs. The value is a comma separated list of dependencies.

You can discover the list of supported packages by executing the following SQL in Snowflake:

SELECT * FROM INFORMATION_SCHEMA.PACKAGES WHERE LANGUAGE = 'python';
Copy
Example
spark.conf.set("snowpark.connect.udf.packages", "[numpy]")

@udtf(returnType="val: int")

class Powers:

  def eval(self, x: int):
      import numpy as np

      for v in np.power(np.array([x, x, x]), [0, 1, 2]):
          yield (int(v),)

spark.udtf.register(name="powers", f=Powers)

spark.sql("SELECT * FROM powers(10)").show()
Copy

Reference: Packages Reference

snowpark.connect.udtf.compatibility_mode

Specify true to enable Spark-compatible UDTF behavior for improved compatibility with Spark UDTF semantics.

Default: false

Since: 1.0.0

Comments

This configuration determines whether UDTFs should use Spark-compatible behavior or the default Snowpark behavior. When enabled (true), it applies a compatibility wrapper that mimics Spark’s output type coercion and error handling patterns.

When enabled, UDTFs use a compatibility wrapper that applies Spark-style automatic type coercion (for example, string “true” to boolean, boolean to integer) and error handling. The wrapper also converts table arguments to Row-like objects for both positional and named access, and properly handles SQL null values to match Spark’s behavior patterns.