Convert an Iceberg table

This topic provides information about converting an Iceberg table that Snowflake doesn’t manage into a table that uses Snowflake as the Iceberg catalog.

You might choose to convert a table when you want full Snowflake platform support, including support for the Snowflake Iceberg Catalog SDK.

To learn about the differences between Iceberg table types, see Iceberg catalog options.

Before and after table conversion

When you convert an Iceberg table to use Snowflake as the catalog, the table becomes writable and Snowflake assumes life-cycle management for it.

The following table compares Iceberg tables before and after conversion:

Before conversion

After conversion

Iceberg catalog

An external catalog (such as AWS Glue), or no catalog at all. Requires a catalog integration.

Snowflake. Snowflake registers changes to the source data and registers the changes in the Snowflake catalog. Snowflake then updates the table metadata on your external volume.

Does not require a catalog integration.

Snowflake read operations

Snowflake write operations

Storage location for table data and metadata

External volume (external cloud storage).

External volume (external cloud storage) under a base location that you specify.

Data and metadata cleanup

Managed by you or your external catalog.

Snowflake does not touch any externally generated data and metadata files.

Accessible from the Snowflake Iceberg Catalog SDK

Important

When you convert an Iceberg table, Snowflake doesn’t lock down or assume sole access to your external storage. To prevent table corruption, ensure that you monitor or stop any non-Snowflake writes (such as automated maintenance jobs) to your external storage location.

Requirements

Before you convert an Iceberg table, ensure that Snowflake can write to your external volume.

For Snowflake to write to your external volume, the following conditions must be met:

  • Use the ALTER ICEBERG TABLE … REFRESH command to manually refresh the table before you convert it.

  • The ALLOW_WRITES property for your external volume is set to TRUE. To update the value of this property for an existing external volume, use the ALTER EXTERNAL VOLUME command. For example: ALTER EXTERNAL VOLUME my_ext_vol SET ALLOW_WRITES=TRUE.

  • The access control permissions that you set on the cloud storage account must allow write access. For example, if you use an external volume configured for Amazon S3, your IAM role must have the s3:PutObject permission for your S3 location.

Note

Converting a table that has an un-materialized identity partition column isn’t supported. An un-materialized identity partition column is created when a table defines an identity transform using a source column that doesn’t exist in a Parquet file.

Example: Convert an Iceberg table

Important

When you convert an Iceberg table, Snowflake doesn’t lock down or assume sole access to your external storage. To prevent table corruption, ensure that you monitor or stop any non-Snowflake writes (such as automated maintenance jobs) to your external storage location.

This example starts by creating an Iceberg table from Iceberg files in object storage. Snowflake uses the METADATA_FILE_PATH value to look for the table metadata in the following location for column definitions: <ext-vol-storage-base-url>/path/to/metadata/v1.metadata.json.

CREATE ICEBERG TABLE myIcebergTable
  EXTERNAL_VOLUME='icebergMetadataVolume'
  CATALOG='icebergCatalogInt'
  METADATA_FILE_PATH='path/to/metadata/v1.metadata.json';
Copy

Next, use the ALTER ICEBERG TABLE … REFRESH command to synchronize the table metadata with the latest metadata file. The following example command refreshes the table by specifying a metadata file path.

ALTER ICEBERG TABLE myIcebergTable REFRESH 'metadata/v2.metadata.json';
Copy

Finally, convert the table to use Snowflake as the Iceberg catalog by using a ALTER ICEBERG TABLE … CONVERT TO MANAGED command.

ALTER ICEBERG TABLE myIcebergTable CONVERT TO MANAGED
  BASE_LOCATION = myBaseLocation;
Copy

Note

In this example, the ALTER statement must specify a BASE_LOCATION because the table was created from Iceberg files in object storage and BASE_LOCATION was not part of the original CREATE ICEBERG TABLE statement. The BASE_LOCATION defines the relative path from your external volume to a directory where Snowflake writes table data and metadata for the converted table.

Otherwise, if BASE_LOCATION was specified in the original CREATE ICEBERG TABLE statement, you don’t need to include it in your ALTER ICEBERG TABLE … CONVERT TO MANAGED command.

For example, Snowflake writes table data to <ext-vol-storage-base-url>/myBaseLocation/data/.

Snowflake writes metadata for the converted table to <ext-vol-storage-base-url>/myBaseLocation/metadata/.

Conversion and data types

If you convert a table with an Iceberg data type that Snowflake doesn’t support, Snowflake uses a Snowflake data type to process and return values. However, Snowflake writes the original Iceberg data type to the table data files so that you can use the table with other compute engines like Apache Spark.

For data types such as int and long, the data type that Snowflake uses for processing supports a larger range of values than the source data type. To stay consistent with the source data type, Snowflake does not allow inserting values outside the range that the source data type supports.

For example, consider a table with a column of Iceberg type int. After conversion, Snowflake processes the column values using the Snowflake data type NUMBER(10,0). NUMBER(10,0) has a range of (-9,999,999,999, +9,999,999,999), but int has a more limited range of (-2,147,483,648, +2,147,483,647). If you try to insert a value of 3,000,000,000 into that column, Snowflake returns an out-of-range error message.

For more information about data types, including limitations, see Iceberg table data types.