Snowflake Connector for Spark

The Snowflake Connector for Spark (“Spark connector”) brings Snowflake into the Apache Spark ecosystem, enabling Spark to read data from, and write data to, Snowflake. From Spark’s perspective, Snowflake looks similar to other Spark data sources (PostgreSQL, HDFS, S3, etc.).

Note

As an alternative to using Spark, consider writing your code to use Snowpark API instead. Snowpark allows you to perform all of your work within Snowflake (rather than in a separate Spark compute cluster). Snowpark also supports pushdown of all operations, including Snowflake UDFs. However, when you want to enforce row and column policies on Iceberg tables, use the Snowflake Spark Connector. For more information, see Enforce data protection policies when querying Apache Iceberg™ tables from Apache Spark™.

Snowflake supports multiple versions of the Spark connector:

  • Spark Connector 2.x: Spark versions 3.2, 3.3, and 3.4.

    • There’s a separate version of the Snowflake connector for each version of Spark. Use the correct version of the connector for your version of Spark.

  • Spark Connector 3.x: Spark versions 3.2, 3.3, 3.4, and 3.5.

    • Each Spark Connector 3 package supports most versions of Spark.

The connector runs as a Spark plugin and is provided as a Spark package (spark-snowflake).

Enforce data protection policies on Apache Iceberg tables accessed from Spark

Snowflake supports enforcing row access and data masking policies on Apache Iceberg tables that you query from Apache Spark™ through Snowflake Horizon Catalog. To enable this enforcement, you must install 3.1.6 or a later version of the Spark connector. The Spark connector connects Spark to Snowflake to evaluate policies that are configured on the Iceberg tables. For more information, see Enforce data protection policies when querying Apache Iceberg™ tables from Apache Spark™.

Next Topics: