Partitioning for Apache Iceberg™ tables¶
This preview introduces support for Iceberg partitioning when writing to Iceberg tables.
“Hidden” partitioning for Apache Iceberg™ is metadata-based and adaptable. Iceberg produces partition values based on transforms that you define when you create a table. When reading from a partitioned table, Iceberg engines use the partition values defined in your table metadata to efficiently identify relevant data.
Considerations¶
Consider the following when you use partitioned writes for Iceberg tables:
Changing the partition specification of a Snowflake-managed Iceberg table isn’t supported. You must drop the table and create a new one with partitioning.
If you use an external engine to add, drop, or replace a partition field in an externally managed table, Snowflake writes data according to the latest partition specification.
Snowflake uses a target file size of 16 MB for partitioned tables. If the default file size is larger, Snowflake automatically decreases the target size to 16 MB when writing to a partitioned table.
The GET_DDL function does not yet include the PARTITION BY clause in its output.
Using COPY INTO <table> on partitioned tables is only supported for
LOAD_MODE = FULL_INGEST
(the default value).Snowpipe isn’t supported.
Using the variant CREATE TABLE AS SELECT or CREATE TABLE LIKE syntax with PARTITION BY isn’t supported.
The sum of the sizes of the outputs for all partition transforms can’t exceed 1024 bytes for a single row.
Partitioning Iceberg tables¶
To create a partitioned Iceberg table, include the PARTITION BY clause with one or more partition transforms in your regular CREATE ICEBERG TABLE statement.
For example, the following expression partitions a TIMESTAMP column named start_date
by day:
PARTITION BY (DAY(start_date))
Iceberg supports various transforms for partitioning. For more information, see Partition Transforms in the Apache Iceberg™ documentation.
CREATE ICEBERG TABLE … PARTITION BY syntax¶
The following syntax block demonstrates how to specify a PARTITION BY clause when you create an Iceberg table. For the full command syntax with parameter descriptions, see CREATE ICEBERG TABLE.
CREATE ICEBERG TABLE [ IF NOT EXISTS ] <table_name>
...
[ PARTITION BY ( partitionExpression [, partitionExpression , ...] ) ]
...
Where:
partitionExpression ::= <col_name> -- identity transform | BUCKET ( <num_buckets> , <col_name> ) | TRUNCATE ( <width> , <col_name> ) | YEAR ( <col_name> ) | MONTH ( <col_name> ) | DAY ( <col_name> ) | HOUR ( <col_name> )
Parameters¶
col_name
Specifies the identifier (name) for the source column to partition.
When used alone (without a transform such as YEAR), specifies an identity transform on the source column. For more information, see identity.
BUCKET
Specifies a bucket transform. For more information, see Bucket Transform Details.
num_buckets
is the number of buckets to group the data into.TRUNCATE
Specifies a truncate transform, which partitions the data based on the truncated values of the specified source column. For more information, see Truncate Transform Details.
YEAR
Specifies a year transform, which extracts the year from a date or timestamp source column value. For more information, see Partition Transforms.
MONTH
Specifies a month transform. For more information, see Partition Transforms.
DAY
Specifies a day transform, which extracts the day from a date or timestamp source column value. For more information, see Partition Transforms.
HOUR
Specifies an hour transform, which extracts the hour from a timestamp source column value. For more information, see Partition Transforms.
Example: Create a partitioned table with an identity transform¶
The following example creates a Snowflake-managed Iceberg table using the value of a column named c_custkey
to
partition the table (also called an identity transform).
CREATE OR REPLACE ICEBERG TABLE customer_iceberg_partitioned (
c_custkey INTEGER,
c_name STRING,
c_address STRING,
c_nationkey INTEGER,
c_phone STRING,
c_acctbal INTEGER,
c_mktsegment STRING,
c_comment STRING
)
PARTITION BY (c_custkey)
EXTERNAL_VOLUME = 'my_ext_vol'
CATALOG = 'SNOWFLAKE'
BASE_LOCATION = 'customer_iceberg_partitioned';
Example: Partition a table by day¶
This example creates an Iceberg table using the value of a timestamp column named start_date
to
partition the table by day.
CREATE OR REPLACE ICEBERG TABLE iceberg_partitioned_date_time (start_date timestamp)
PARTITION BY (DAY(start_date))
EXTERNAL_VOLUME = 'my_ext_vol'
CATALOG = 'SNOWFLAKE';
You can insert data into the table using supported table-loading features. For example, use an INSERT INTO statement to
insert the following data into the empty iceberg_partitioned_date_time
table created previously:
INSERT INTO iceberg_partitioned_date_time (start_date)
VALUES
(to_timestamp_ntz('2023-01-02 00:00:00')),
(to_timestamp_ntz('2023-02-03 00:00:00')),
(to_timestamp_ntz('2023-01-02 01:00:00')),
(to_timestamp_ntz('2023-02-03 02:00:00'));