Partitioning for Apache Iceberg™ tables¶

This preview introduces support for Iceberg partitioning when writing to Iceberg tables.

“Hidden” partitioning for Apache Iceberg™ is metadata-based and adaptable. Iceberg produces partition values based on transforms that you define when you create a table. When reading from a partitioned table, Iceberg engines use the partition values defined in your table metadata to efficiently identify relevant data.

Considerations¶

Consider the following when you use partitioned writes for Iceberg tables:

  • Changing the partition specification of a Snowflake-managed Iceberg table isn’t supported. You must drop the table and create a new one with partitioning.

  • If you use an external engine to add, drop, or replace a partition field in an externally managed table, Snowflake writes data according to the latest partition specification.

  • Snowflake uses a target file size of 16 MB for partitioned tables. If the default file size is larger, Snowflake automatically decreases the target size to 16 MB when writing to a partitioned table.

  • The GET_DDL function does not yet include the PARTITION BY clause in its output.

  • Using COPY INTO <table> on partitioned tables is only supported for LOAD_MODE = FULL_INGEST (the default value).

  • Snowpipe isn’t supported.

  • Using the variant CREATE TABLE AS SELECT or CREATE TABLE LIKE syntax with PARTITION BY isn’t supported.

  • The sum of the sizes of the outputs for all partition transforms can’t exceed 1024 bytes for a single row.

Partitioning Iceberg tables¶

To create a partitioned Iceberg table, include the PARTITION BY clause with one or more partition transforms in your regular CREATE ICEBERG TABLE statement.

For example, the following expression partitions a TIMESTAMP column named start_date by day:

PARTITION BY (DAY(start_date))
Copy

Iceberg supports various transforms for partitioning. For more information, see Partition Transforms in the Apache Iceberg™ documentation.

CREATE ICEBERG TABLE … PARTITION BY syntax¶

The following syntax block demonstrates how to specify a PARTITION BY clause when you create an Iceberg table. For the full command syntax with parameter descriptions, see CREATE ICEBERG TABLE.

CREATE ICEBERG TABLE [ IF NOT EXISTS ] <table_name>
  ...
  [ PARTITION BY ( partitionExpression [, partitionExpression , ...] ) ]
  ...
Copy

Where:

partitionExpression ::=
  <col_name> -- identity transform
  | BUCKET ( <num_buckets> , <col_name> )
  | TRUNCATE ( <width> , <col_name> )
  | YEAR ( <col_name> )
  | MONTH ( <col_name> )
  | DAY ( <col_name> )
  | HOUR ( <col_name> )
Copy

Parameters¶

col_name

Specifies the identifier (name) for the source column to partition.

When used alone (without a transform such as YEAR), specifies an identity transform on the source column. For more information, see identity.

BUCKET

Specifies a bucket transform. For more information, see Bucket Transform Details.

num_buckets is the number of buckets to group the data into.

TRUNCATE

Specifies a truncate transform, which partitions the data based on the truncated values of the specified source column. For more information, see Truncate Transform Details.

YEAR

Specifies a year transform, which extracts the year from a date or timestamp source column value. For more information, see Partition Transforms.

MONTH

Specifies a month transform. For more information, see Partition Transforms.

DAY

Specifies a day transform, which extracts the day from a date or timestamp source column value. For more information, see Partition Transforms.

HOUR

Specifies an hour transform, which extracts the hour from a timestamp source column value. For more information, see Partition Transforms.

Example: Create a partitioned table with an identity transform¶

The following example creates a Snowflake-managed Iceberg table using the value of a column named c_custkey to partition the table (also called an identity transform).

CREATE OR REPLACE ICEBERG TABLE customer_iceberg_partitioned (
  c_custkey INTEGER,
  c_name STRING,
  c_address STRING,
  c_nationkey INTEGER,
  c_phone STRING,
  c_acctbal INTEGER,
  c_mktsegment STRING,
  c_comment STRING
)
  PARTITION BY (c_custkey)
  EXTERNAL_VOLUME = 'my_ext_vol'
  CATALOG = 'SNOWFLAKE'
  BASE_LOCATION = 'customer_iceberg_partitioned';
Copy

Example: Partition a table by day¶

This example creates an Iceberg table using the value of a timestamp column named start_date to partition the table by day.

CREATE OR REPLACE ICEBERG TABLE iceberg_partitioned_date_time (start_date timestamp)
  PARTITION BY (DAY(start_date))
  EXTERNAL_VOLUME = 'my_ext_vol'
  CATALOG = 'SNOWFLAKE';
Copy

You can insert data into the table using supported table-loading features. For example, use an INSERT INTO statement to insert the following data into the empty iceberg_partitioned_date_time table created previously:

INSERT INTO iceberg_partitioned_date_time (start_date)
  VALUES
    (to_timestamp_ntz('2023-01-02 00:00:00')),
    (to_timestamp_ntz('2023-02-03 00:00:00')),
    (to_timestamp_ntz('2023-01-02 01:00:00')),
    (to_timestamp_ntz('2023-02-03 02:00:00'));
Copy