CREATE ICEBERG TABLE (Iceberg REST catalog)¶

Creates or replaces an Apache Iceberg™ table in the current/specified schema for an Iceberg REST catalog.

Use this command for the following scenarios:

You want to use a remote Iceberg catalog that complies with the open source Apache Iceberg REST OpenAPI specification.
You want to query a table in Snowflake Open Catalog or Apache Polaris™. For more information, see Query a table in Snowflake Open Catalog using Snowflake.
You want to create an externally managed table in a catalog-linked database. See CREATE ICEBERG TABLE (catalog-linked database).

Note

Before creating a table, you must create the external volume where the Iceberg metadata and data files are stored. For instructions, see Configure an external volume.

You also need a catalog integration for the table. For more information, see Configure a catalog integration for Apache Iceberg™ REST catalogs or Configure a catalog integration for Snowflake Open Catalog.

See also:: ALTER ICEBERG TABLE , DROP ICEBERG TABLE , SHOW ICEBERG TABLES , DESCRIBE ICEBERG TABLE , UNDROP ICEBERG TABLE

Syntax¶

CREATE [ OR REPLACE ] ICEBERG TABLE [ IF NOT EXISTS ] <table_name>
  [ EXTERNAL_VOLUME = '<external_volume_name>' ]
  [ CATALOG = '<catalog_integration_name>' ]
  CATALOG_TABLE_NAME = '<rest_catalog_table_name>'
  [ CATALOG_NAMESPACE = '<catalog_namespace>' ]
  [ PATH_LAYOUT = { FLAT | HIERARCHICAL } ]
  [ TARGET_FILE_SIZE = '{ AUTO | 16MB | 32MB | 64MB | 128MB }' ]
  [ REPLACE_INVALID_CHARACTERS = { TRUE | FALSE } ]
  [ AUTO_REFRESH = { TRUE | FALSE } ]
  [ COMMENT = '<string_literal>' ]
  [ STORAGE_SERIALIZATION_POLICY = { COMPATIBLE | OPTIMIZED } ]
  [ ENABLE_ICEBERG_MERGE_ON_READ = { TRUE | FALSE } ]
  [ [ WITH ] TAG ( <tag_name> = '<tag_value>' [ , <tag_name> = '<tag_value>' , ... ] ) ]
  [ WITH CONTACT ( <purpose> = <contact_name> [ , <purpose> = <contact_name> ... ] ) ]

Where:

partitionExpression ::=
  <col_name> -- identity transform
  | BUCKET ( <num_buckets> , <col_name> )
  | TRUNCATE ( <width> , <col_name> )
  | YEAR ( <col_name> )
  | MONTH ( <col_name> )
  | DAY ( <col_name> )
  | HOUR ( <col_name> )

Variant syntax¶

CREATE ICEBERG TABLE (catalog-linked database)¶

CREATE [ OR REPLACE ] ICEBERG TABLE [ IF NOT EXISTS ] <table_name>
  [
    --Column definition
    <col_name> <col_type> [ DEFAULT <col_default> ]
      [ [ WITH ] MASKING POLICY <policy_name> [ USING ( <col_name> , <cond_col1> , ... ) ] ]

    -- Additional column definitions
    [ , <col_name> <col_type> [ DEFAULT <col_default> ] [ ... ] ]
  ]
  [ PARTITION BY ( partitionExpression [ , partitionExpression , ... ] ) ]
  [ PATH_LAYOUT = { FLAT | HIERARCHICAL } ]
  [ TARGET_FILE_SIZE = '{ AUTO | 16MB | 32MB | 64MB | 128MB }' ]
  [ MAX_DATA_EXTENSION_TIME_IN_DAYS = <integer> ]
  [ AUTO_REFRESH = { TRUE | FALSE } ]
  [ REPLACE_INVALID_CHARACTERS = { TRUE | FALSE } ]
  [ COPY GRANTS ]
  [ COMMENT = '<string_literal>' ]
  [ ICEBERG_VERSION = <integer> ]
  [ ENABLE_ICEBERG_MERGE_ON_READ = { TRUE | FALSE } ]
  [ [ WITH ] TAG ( <tag_name> = '<tag_value>' [ , <tag_name> = '<tag_value>' , ... ] ) ]
  [ BASE_LOCATION = '<path_to_directory_for_table_files>' ]
  [ STORAGE_SERIALIZATION_POLICY = { COMPATIBLE | OPTIMIZED } ]

Where:

partitionExpression ::=
  <col_name> -- identity transform
  | BUCKET ( <num_buckets> , <col_name> )
  | TRUNCATE ( <width> , <col_name> )
  | YEAR ( <col_name> )
  | MONTH ( <col_name> )
  | DAY ( <col_name> )
  | HOUR ( <col_name> )

CREATE ICEBERG TABLE (catalog-linked database) … AS SELECT¶

CREATE [ OR REPLACE ] ICEBERG TABLE <table_name> [ ( <col_name> [ <col_type> ] , <col_name> [ <col_type> ] , ... ) ]
  [ ... ]
  AS SELECT <query>

You can apply a masking policy to a column in a CTAS statement. Specify the masking policy after the column data type. For example:

CREATE [ OR REPLACE ] ICEBERG TABLE <table_name> ( <col1> <data_type> [ WITH ] MASKING POLICY <policy_name> [ , ... ] )
  [ ... ]
  AS SELECT <query>

Required parameters¶

table_name

Specifies the identifier (name) for the table in Snowflake; must be unique for the schema in which the table is created.

In addition, the identifier must start with an alphabetic character and cannot contain spaces or special characters unless the entire identifier string is enclosed in double quotes (for example, "My object"). Identifiers enclosed in double quotes are also case-sensitive.

For more information, see Identifier requirements.

Note

To retrieve a list of tables or namespaces in your remote catalog, you can use the following functions:

CATALOG_TABLE_NAME = 'rest_catalog_table_name'

Specifies the table name as recognized by your external catalog. This parameter can’t be changed after you create the table.

Note

Don’t specify a namespace with the table name (mynamespace.mytable). To specify a namespace for this table, and override the default namespace set for the catalog integration, use the CATALOG_NAMESPACE parameter.

col_name

For creating a table in a catalog-linked database (preview).

Specifies a column identifier (name). All the requirements for table identifiers also apply to column identifiers.

For more information, see Identifier requirements and Reserved & limited keywords.

Note

In addition to the standard reserved keywords, the following keywords can’t be used as column identifiers because they are reserved for ANSI-standard context functions:

CURRENT_DATE
CURRENT_ROLE
CURRENT_TIME
CURRENT_TIMESTAMP
CURRENT_USER

For the list of reserved keywords, see Reserved & limited keywords.

col_type

For creating a table in a catalog-linked database (preview).

Specifies the data type for the column.

For information about the data types that can be specified for table columns, see Data types for Apache Iceberg™ tables.

Optional parameters¶

col_name col_type DEFAULT col_default

For a table that conforms to Iceberg v3, specifies both the initial default and write default for the specified column. If the data type for the column is string, you must surround the default value with single quotes.

Important

When you specify a default value for a column, you must specify a static value; you can’t specify an expression or function for the value. This requirement is in accordance with the Iceberg v3 specification and applies to both the initial default and write default.

Default values is an Iceberg v3 feature, so you can’t specify a default value for a table that conforms to Iceberg v2. For more information about using default values with Iceberg tables, see Use default values with Iceberg tables.

Note

To change the write default for the column after you create the table, run ALTER ICEBERG TABLE … ALTER COLUMN … SET WRITE DEFAULT.

PARTITION BY = ( partitionExpression [ , partitionExpression , ... ] ): Specifies one or more partition expressions.

PATH_LAYOUT = { FLAT | HIERARCHICAL }

Specifies the path layout that Snowflake uses when writing Parquet data files to the table:

FLAT: Snowflake writes all Parquet data files under the data/ directory for the table.
HIERARCHICAL: Snowflake writes partitioned data under the data/ directory for the table by using a hierarchical path layout. With this layout, each partition column is represented as a directory level in the path. To define these partition columns, use the PARTITION BY parameter. This layout is also called “Hive-style” partitioning.

If you specify PATH_LAYOUT = HIERARCHICAL without a PARTITION BY clause, Snowflake stores the Parquet data files by using a flat layout path. You can’t modify the path layout for an existing table, so you might set this parameter to HIERARCHICAL without specifying a PARTITION BY clause if you don’t want to use partitioning with hierarchical paths now but you might in the future.

Note

For externally managed tables that you create in a standard Snowflake database, Snowflake infers and honors the partitioning scheme that is specified by the remote catalog.

Default: FLAT

MASKING POLICY = policy_name

For creating a table in a catalog-linked database (preview).

Specifies the masking policy to set on a column. The masking policy must belong to a standard Snowflake database (not the catalog-linked database).

EXTERNAL_VOLUME = 'external_volume_name'

Specifies the identifier (name) for the external volume where the Iceberg table stores its metadata files and data in Parquet format. Iceberg metadata and manifest files store the table schema, partitions, snapshots, and other metadata.

If you don’t specify this parameter, the Iceberg table defaults to the external volume for the schema, database, or account. The schema takes precedence over the database, and the database takes precedence over the account.

CATALOG = 'catalog_integration_name'

Specifies the identifier (name) of the catalog integration for this table.

If you don’t specify this parameter, the Iceberg table defaults to the catalog integration for the schema, database, or account. The schema takes precedence over the database, and the database takes precedence over the account.

CATALOG_NAMESPACE = 'catalog_namespace'

Optionally specifies the namespace (for example, my_database) for the REST catalog source. By specifying a namespace with the catalog integration and then at the table level, you can use a single REST catalog integration to create Iceberg tables across different databases. If you don’t specify a namespace with the table, the table uses the default catalog namespace associated with the catalog integration.
If a default namespace isn’t specified with the catalog integration, you must specify the namespace for the REST catalog source to set a catalog namespace for the table.

Note

To retrieve a list of tables or namespaces in your remote catalog, you can use the following functions:

TARGET_FILE_SIZE = '{ AUTO | 16MB | 32MB | 64MB | 128MB }'

Specifies a target Parquet file size for the table.

'{ 16MB | 32MB | 64MB | 128MB }' specifies a fixed target file size for the table.
'AUTO' works differently, depending on the table type:
- Snowflake-managed tables: AUTO specifies that Snowflake should choose the file size for the table based on table characteristics such as size, DML patterns, ingestion workload, and clustering configuration. Snowflake automatically adjusts the file size, starting at 16 MB, for better read and write performance in Snowflake. Use this option to optimize table performance in Snowflake.
- Externally managed tables: AUTO specifies that Snowflake should aggressively scale to the largest file size (128 MB).

For more information, see Set a target file size.

Default: AUTO

MAX_DATA_EXTENSION_TIME_IN_DAYS = integer

Object parameter that specifies the maximum number of days for which Snowflake can extend the data retention period for the table to prevent streams on the table from becoming stale.

For a detailed description of this parameter, see MAX_DATA_EXTENSION_TIME_IN_DAYS.

REPLACE_INVALID_CHARACTERS = { TRUE | FALSE }

Specifies whether to replace invalid UTF-8 characters with the Unicode replacement character (�) in query results. You can only set this parameter for tables that use an external Iceberg catalog.

TRUE replaces invalid UTF-8 characters with the Unicode replacement character.
FALSE leaves invalid UTF-8 characters unchanged. Snowflake returns a user error message when it encounters invalid UTF-8 characters in a Parquet data file.

If not specified, the Iceberg table defaults to the parameter value for the schema, database, or account. The schema takes precedence over the database, and the database takes precedence over the account.

Default: FALSE

AUTO_REFRESH = { TRUE | FALSE }

Specifies whether Snowflake should automatically poll the external Iceberg catalog that is associated with the table for metadata updates.

If no value is specified for the REFRESH_INTERVAL_SECONDS parameter on the catalog integration, Snowflake uses a default refresh interval of 30 seconds.

For more information, see automated refresh.

Default: FALSE

Note

Using AUTO_REFRESH with INFER_SCHEMA isn’t supported.

COPY GRANTS

Specifies to retain the access privileges from the original table when a new table is created using any of the following CREATE TABLE variants:

CREATE OR REPLACE TABLE

The parameter copies all privileges, except OWNERSHIP, from the existing table to the new table. The new table does not inherit any future grants defined for the object type in the schema. By default, the role that executes the CREATE TABLE statement owns the new table.

If the parameter is not included in the CREATE ICEBERG TABLE statement, then the new table does not inherit any explicit access privileges granted on the original table, but does inherit any future grants defined for the object type in the schema.

Note:

With data sharing:
- If the existing table was shared to another account, the replacement table is also shared.
- If the existing table was shared with your account as a data consumer, and access was further granted to other roles in the account (using GRANT IMPORTED PRIVILEGES on the parent database), access is also granted to the replacement table.
The SHOW GRANTS output for the replacement table lists the grantee for the copied privileges as the role that executed the CREATE ICEBERG TABLE statement, with the current timestamp when the statement was executed.
The operation to copy grants occurs atomically in the CREATE ICEBERG TABLE command (that is, within the same transaction).

COMMENT = 'string_literal'

Specifies a comment for the table.

Default: No value

TAG ( tag_name = 'tag_value' [ , tag_name = 'tag_value' , ... ] )

Specifies the tag name and the tag string value.

The tag value is always a string, and the maximum number of characters for the tag value is 256.

For information about specifying tags in a statement, see Tag quotas.

BASE_LOCATION = 'path_to_directory_for_table_files'

The path to a directory, which Snowflake uses to construct write paths for the table’s data and metadata files.

If you use an EXTERNAL_VOLUME, this path must be included with the storage paths that are specified for the external volume and you have the option to specify a relative path. If you specify a relative path, it is relative to the STORAGE_BASE_URL for the external volume. If not specified, Snowflake constructs a write path by using attributes such as the value of the BASE_LOCATION_PREFIX parameter and the table name.

If you’re using vended credentials, you must also specify an absolute path.

Note

This directory can’t be changed after you create a table.

ICEBERG_VERSION = integer

Specifies the version of the Apache Iceberg™ specification that the table conforms to.

Caution

Before you use other engines to upgrade an Iceberg tables format-version in table properties to v3, ensure that the table isn’t used by engines or applications that don’t yet support v3. Downgrading format versions isn’t supported in the Apache Iceberg specification. Therefore, all readers and writers must support v3. The default version for Iceberg tables in Snowflake is v2, which can be configured to v3 if needed. Using Snowflake to perform in-place version upgrades isn’t supported at this time.

If you don’t set this parameter, the Iceberg table defaults to the Iceberg version for the schema, database, or account. The schema takes precedence over the database, and the database takes precedence over the account.

2: The table conforms with Iceberg version 2.

3: The table conforms with Iceberg version 3.

Default: 2

For more information about this parameter, see ICEBERG_VERSION.

ENABLE_ICEBERG_MERGE_ON_READ = { TRUE | FALSE }

Specifies whether the table uses merge-on-read behavior.

If you don’t set this parameter, the Iceberg table defaults to the merge-on-read behavior that is specified for the schema, database, or account. The schema takes precedence over the database, and the database takes precedence over the account.

Values:

TRUE: The table uses merge-on-read behavior. Depending on whether the table conforms to v2 or v3 of the Apache Iceberg™ table specification, the behavior is as described in the following list:

If the table conforms with v2, use positional delete files.
If the table conforms with v3, use deletion vectors.

FALSE: The table uses copy-on-write behavior.

Default: TRUE

For a detailed description of this parameter, see ENABLE_ICEBERG_MERGE_ON_READ.

WITH CONTACT ( purpose = contact [ , purpose = contact ...] )

Associate the new object with one or more contacts.

Specify the WITH CONTACT clause after all other clauses except the AS clause (if that clause is supported by this command).

Partition expression parameters (`partitionExpression`)¶

Snowflake supports all partition transforms in version 2 of the Apache Iceberg specification. For more information, see Partition Transforms.

For more information about partitioning Iceberg tables, see Iceberg partitioning.

col_name

Specifies the identifier (name) for the source column to partition.

When used alone, without a transform such as YEAR, specifies an identity transform on the source column. For more information, see identity.

BUCKET

Specifies a bucket transform. For more information, see Bucket Transform Details.

num_buckets is the number of buckets to group the data into.

TRUNCATE

Specifies a truncate transform, which partitions the data based on the truncated values of the specified source column. For more information, see Truncate Transform Details.

YEAR

Specifies a year transform, which extracts the year from a date or timestamp source-column value. For more information, see Partition Transforms.

MONTH

Specifies a month transform. For more information, see Partition Transforms.

DAY

Specifies a day transform, which extracts the day from a date or timestamp source-column value. For more information, see Partition Transforms.

HOUR

Specifies an hour transform, which extracts the hour from a timestamp source-column value. For more information, see Partition Transforms.

Access control requirements¶

A role used to execute this operation must have the following privileges at a minimum:


Privilege	Object	Notes
CREATE ICEBERG TABLE	Schema
CREATE EXTERNAL VOLUME	Account	Required to create a new external volume.
USAGE	External Volume	Required to reference an existing external volume.
CREATE INTEGRATION	Account	Required to create a new catalog integration.
USAGE	Catalog integration	Required to reference an existing catalog integration.

Operating on an object in a schema requires at least one privilege on the parent database and at least one privilege on the parent schema.

For instructions on creating a custom role with a specified set of privileges, see Creating custom roles.

For general information about roles and privilege grants for performing SQL actions on securable objects, see Overview of Access Control.

Usage notes¶

Examples¶

Create an Iceberg table that uses a remote Iceberg REST catalog¶

CREATE OR REPLACE ICEBERG TABLE my_iceberg_table
  EXTERNAL_VOLUME = 'my_external_volume'
  CATALOG = 'my_rest_catalog_integration'
  CATALOG_TABLE_NAME = 'my_remote_table'
  AUTO_REFRESH = TRUE;

Create an Iceberg table to query a table in Snowflake Open Catalog¶

This example creates an Iceberg table that you can use to Query a table in Snowflake Open Catalog using Snowflake.

CREATE ICEBERG TABLE open_catalog_iceberg_table
  EXTERNAL_VOLUME = 'my_external_volume'
  CATALOG = 'open_catalog_int'
  CATALOG_TABLE_NAME = 'my_open_catalog_table'
  AUTO_REFRESH = TRUE;

Create an Iceberg table in a catalog-linked database¶

The following example creates a writable Iceberg table in a catalog-linked database with column definitions:

USE DATABASE my_catalog_linked_db;

USE SCHEMA 'my_namespace';

CREATE OR REPLACE ICEBERG TABLE my_iceberg_table (
  first_name string,
  last_name string,
  amount int,
  create_date date
);

Create a partitioned table in a catalog-linked database¶

The following example creates an externally managed Iceberg table by using the value of a timestamp column named start_date to partition the table by day:

USE DATABASE my_catalog_linked_db;

USE SCHEMA 'my_namespace';

CREATE OR REPLACE ICEBERG TABLE iceberg_partitioned_date_time (start_date timestamp)
  PARTITION BY (DAY(start_date));

You can insert data into the table by using supported table-loading features. For example, use an INSERT INTO statement to insert the following data into the empty iceberg_partitioned_date_time table created previously:

INSERT INTO iceberg_partitioned_date_time (start_date)
  VALUES
    (to_timestamp_ntz('2023-01-02 00:00:00')),
    (to_timestamp_ntz('2023-02-03 00:00:00')),
    (to_timestamp_ntz('2023-01-02 01:00:00')),
    (to_timestamp_ntz('2023-02-03 02:00:00'));

For more information, see Iceberg partitioning.

Create an externally managed Iceberg v3 table¶

The following example creates an Apache Iceberg™ table that uses a remote Iceberg REST catalog and conforms to v3 of the Apache Iceberg™ specification:

Note

You don’t need to specify ICEBERG_VERSION = 3 with the command because the format version is already defined in the external catalog’s metadata, so Snowflake retrieves this version from the metadata.

CREATE ICEBERG TABLE my_v3_iceberg_table
  EXTERNAL_VOLUME = 'my_external_volume'
  CATALOG = 'my_rest_catalog_integration'
  CATALOG_TABLE_NAME = 'my_remote_table'
  AUTO_REFRESH = TRUE;

Create an Iceberg v3 table in a catalog-linked database¶

The following example creates a writable Iceberg table in a catalog-linked database with column definitions and conforms to v3 of the Apache Iceberg™ specification:

USE DATABASE my_catalog_linked_db;

USE SCHEMA 'my_namespace';

CREATE OR REPLACE ICEBERG TABLE my_iceberg_table (
  first_name string,
  last_name string,
  amount int,
  create_date date
)
  ICEBERG_VERSION = 3;

Create a partitioned table in a catalog-linked database with hierarchical path layout¶

The following example creates an externally managed Iceberg table by using the value of a timestamp column named start_date to partition the table by day. Because PATH_LAYOUT = HIERARCHICAL, Snowflake writes data to the partitioned Iceberg table by using a hierarchical path layout for files where partitioning information is included in the file paths:

USE DATABASE my_catalog_linked_db;

USE SCHEMA 'my_namespace';

CREATE OR REPLACE ICEBERG TABLE iceberg_partitioned_date_time (start_date timestamp)
  PARTITION BY (DAY(start_date))
  PATH_LAYOUT = HIERARCHICAL;

INSERT INTO iceberg_partitioned_date_time (start_date)
  VALUES
    (to_timestamp_ntz('2023-01-02 00:00:00')),
    (to_timestamp_ntz('2023-02-03 00:00:00')),
    (to_timestamp_ntz('2023-01-02 01:00:00')),
    (to_timestamp_ntz('2023-02-03 02:00:00'));

For more information, see Partitioning with hierarchical paths.

CREATE ICEBERG TABLE (Iceberg REST catalog)¶

Syntax¶

Variant syntax¶

CREATE ICEBERG TABLE (catalog-linked database)¶

CREATE ICEBERG TABLE (catalog-linked database) … AS SELECT¶

Required parameters¶

Optional parameters¶

Partition expression parameters (partitionExpression)¶

Access control requirements¶

Usage notes¶

Examples¶

Create an Iceberg table that uses a remote Iceberg REST catalog¶

Create an Iceberg table to query a table in Snowflake Open Catalog¶

Create an Iceberg table in a catalog-linked database¶

Create a partitioned table in a catalog-linked database¶

Create an externally managed Iceberg v3 table¶

Create an Iceberg v3 table in a catalog-linked database¶

Create a partitioned table in a catalog-linked database with hierarchical path layout¶

Partition expression parameters (`partitionExpression`)¶