Partitioning for Apache Iceberg™ tables¶

This preview introduces support for Iceberg partitioning when writing to Iceberg tables.

“Hidden” partitioning for Apache Iceberg™ is metadata-based and adaptable. Iceberg produces partition values based on transforms that you define when you create a table. When reading from a partitioned table, Iceberg engines use the partition values defined in your table metadata to efficiently identify relevant data.

Considerations¶

Consider the following when you use partitioned writes for Iceberg tables:

This feature is supported for both Snowflake-managed Iceberg tables and externally managed tables with write support (currently in preview).
Changing the partition specification of a Snowflake-managed Iceberg table isn’t supported. You must drop the table and create a new one with partitioning.
If you use an external engine to add, drop, or replace a partition field in an externally managed table, Snowflake writes data according to the latest partition specification.
Snowflake uses a target file size of 16 MB for partitioned tables. If the default file size is larger, Snowflake automatically decreases the target size to 16 MB when writing to a partitioned table.
The GET_DDL function does not yet include the PARTITION BY clause in its output.
Using COPY INTO <table> on partitioned tables is only supported for LOAD_MODE = FULL_INGEST (the default value).
Snowpipe isn’t supported.
Using the variant CREATE TABLE AS SELECT or CREATE TABLE LIKE syntax with PARTITION BY isn’t supported.
The sum of the sizes of the outputs for all partition transforms can’t exceed 1024 bytes for a single row.

Partitioning Iceberg tables¶

To create a partitioned Iceberg table, include the PARTITION BY clause with one or more partition transforms in your regular CREATE ICEBERG TABLE statement.

For example, the following expression partitions a TIMESTAMP column named start_date by day:

PARTITION BY (DAY(start_date))

Copy

Iceberg supports various transforms for partitioning. For more information, see Partition Transforms in the Apache Iceberg™ documentation.

CREATE ICEBERG TABLE … PARTITION BY syntax¶

The following syntax block demonstrates how to specify a PARTITION BY clause when you create an Iceberg table. For the full command syntax with parameter descriptions, see CREATE ICEBERG TABLE.

CREATE ICEBERG TABLE [ IF NOT EXISTS ] <table_name>
  ...
  [ PARTITION BY ( partitionExpression [, partitionExpression , ...] ) ]
  ...

Copy

Where:

partitionExpression ::=
  <col_name> -- identity transform
  | BUCKET ( <num_buckets> , <col_name> )
  | TRUNCATE ( <width> , <col_name> )
  | YEAR ( <col_name> )
  | MONTH ( <col_name> )
  | DAY ( <col_name> )
  | HOUR ( <col_name> )

Copy

Parameters¶

col_name

Specifies the identifier (name) for the source column to partition.

When used alone (without a transform such as YEAR), specifies an identity transform on the source column. For more information, see identity.

BUCKET

Specifies a bucket transform. For more information, see Bucket Transform Details.

num_buckets is the number of buckets to group the data into.

TRUNCATE

Specifies a truncate transform, which partitions the data based on the truncated values of the specified source column. For more information, see Truncate Transform Details.

YEAR

Specifies a year transform, which extracts the year from a date or timestamp source column value. For more information, see Partition Transforms.

MONTH

Specifies a month transform. For more information, see Partition Transforms.

DAY

Specifies a day transform, which extracts the day from a date or timestamp source column value. For more information, see Partition Transforms.

HOUR

Specifies an hour transform, which extracts the hour from a timestamp source column value. For more information, see Partition Transforms.

Example: Create a partitioned table with an identity transform¶

The following example creates a Snowflake-managed Iceberg table using the value of a column named c_custkey to partition the table (also called an identity transform).

CREATE OR REPLACE ICEBERG TABLE customer_iceberg_partitioned (
  c_custkey INTEGER,
  c_name STRING,
  c_address STRING,
  c_nationkey INTEGER,
  c_phone STRING,
  c_acctbal INTEGER,
  c_mktsegment STRING,
  c_comment STRING
)
  PARTITION BY (c_custkey)
  EXTERNAL_VOLUME = 'my_ext_vol'
  CATALOG = 'SNOWFLAKE'
  BASE_LOCATION = 'customer_iceberg_partitioned';

Copy

Example: Partition a table by day¶

This example creates an externally managed Iceberg table by using the value of a timestamp column named start_date to partition the table by day.

CREATE OR REPLACE ICEBERG TABLE iceberg_partitioned_date_time (start_date timestamp)
  PARTITION BY (DAY(start_date))
  EXTERNAL_VOLUME = 'my_ext_vol'
  CATALOG = 'my_catalog_int';

Copy

You can insert data into the table using supported table-loading features. For example, use an INSERT INTO statement to insert the following data into the empty iceberg_partitioned_date_time table created previously:

INSERT INTO iceberg_partitioned_date_time (start_date)
  VALUES
    (to_timestamp_ntz('2023-01-02 00:00:00')),
    (to_timestamp_ntz('2023-02-03 00:00:00')),
    (to_timestamp_ntz('2023-01-02 01:00:00')),
    (to_timestamp_ntz('2023-02-03 02:00:00'));

Copy