Snowflake Iceberg Catalog SDK

This topic covers how to configure and use the Snowflake Iceberg Catalog SDK. The SDK is available for Apache Iceberg versions 1.2.0 or later.

With the Snowflake Iceberg Catalog SDK, you can use the Apache Spark engine to query Iceberg tables.

Supported catalog operations

The SDK supports the following commands for browsing Iceberg metadata in Snowflake:

  • SHOW NAMESPACES

  • USE NAMESPACE

  • SHOW TABLES

  • USE DATABASE

  • USE SCHEMA

The SDK currently supports read operations (SELECT statements) only.

Install and connect

To install the Snowflake Iceberg Catalog SDK, download the latest version of the Iceberg libraries.

Before you can use the Snowflake Iceberg Catalog SDK, you need a Snowflake database with one or more Iceberg tables. To create an Iceberg table, see Create an Iceberg table.

After you establish a connection and the SDK confirms that Iceberg metadata is present, Snowflake accesses your Parquet data using the external volume that is associated with your Iceberg table(s).

Examples

To read table data with the SDK, start by configuring the following properties for your Spark cluster:

spark-shell --packages org.apache.iceberg:iceberg-spark-runtime-3.3_2.13:1.2.0,net.snowflake:snowflake-jdbc:3.13.28
# Configure a catalog named "snowflake_catalog" using the standard Iceberg SparkCatalog adapter
--conf spark.sql.catalog.snowflake_catalog=org.apache.iceberg.spark.SparkCatalog
# Specify the implementation of the named catalog to be Snowflake's Catalog implementation
--conf spark.sql.catalog.snowflake_catalog.catalog-impl=org.apache.iceberg.snowflake.SnowflakeCatalog
# Provide a Snowflake JDBC URI with which the Snowflake Catalog will perform low-level communication with Snowflake services
--conf spark.sql.catalog.snowflake_catalog.uri='jdbc:snowflake://<account_identifier>.snowflakecomputing.com'
# Configure the Snowflake user on whose behalf to perform Iceberg metadata lookups
--conf spark.sql.catalog.snowflake_catalog.jdbc.user=<user_name>
# Provide the user password. To configure the credentials, you can provide either password or private_key_file.
--conf spark.sql.catalog.snowflake_catalog.jdbc.password=<password>
# Configure the private_key_file to use when connecting to Snowflake services; additional connection options can be found at https://docs.snowflake.com/en/user-guide/jdbc-configure.html
--conf spark.sql.catalog.snowflake_catalog.jdbc.private_key_file=<location of the private key>
Copy

Note

You can use any Snowflake-supported JDBC driver connection parameter in your configuration by using the following syntax: --conf spark.sql.catalog.snowflake_catalog.jdbc.property-name=property-value

After you configure your Spark cluster, you can check which tables are available to query. For example:

spark.sessionState.catalogManager.setCurrentCatalog("snowflake_catalog");
spark.sql("SHOW NAMESPACES").show()
spark.sql("SHOW NAMESPACES IN my_database").show()
spark.sql("USE my_database.my_schema").show()
spark.sql("SHOW TABLES").show()
Copy

Then you can select a table to query.

spark.sql("SELECT * FROM my_database.my_schema.my_table WHERE ").show()
Copy

You can use the DataFrame structure with languages like Python and Scala to query data.

df = spark.table("my_database.my_schema.my_table")
df.show()
Copy

Note

If you receive vectorized read errors while running queries, you can disable the vectorized reads for your session by configuring: spark.sql.iceberg.vectorization.enabled=false. To keep using vectorized reads, work with your account team representative to set an account parameter.

Query caching

When you issue a query, Snowflake caches the result within a certain time frame (90 seconds by default). You might experience latency up to that duration. If you plan to access data programmatically for comparison purposes, you can set the spark.sql.catalog..cache-enabled property to false to disable caching.

If your application is designed to tolerate a specific amount of latency, you can use the following property to specify the latency period: spark.sql.catalog..cache.expiration-interval-ms.

Limitations

The following limitations apply to the Snowflake Iceberg Catalog SDK and are subject to change:

  • Only Apache Spark is supported for reading Iceberg tables.

  • Time travel queries are not supported.

  • You cannot use the SDK to access non-Iceberg Snowflake tables.