Enable automatic table schema evolution¶
Semi-structured data tends to evolve over time. Systems that generate data add new columns to accommodate additional information, which requires downstream tables to evolve accordingly.
The structure of tables in Snowflake can evolve automatically to support the structure of new data received from the data sources. Snowflake supports the following:
Automatically adding new columns.
Automatically dropping the NOT NULL constraint from columns that are missing in new data files.
To enable table schema evolution, do the following:
If you are creating a new table, set the
ENABLE_SCHEMA_EVOLUTIONparameter to TRUE when you use the CREATE TABLE command.For an existing table, modify the table using the ALTER TABLE command and set the
ENABLE_SCHEMA_EVOLUTIONparameter to TRUE.
Loading data from files evolves the table columns when all of the following are true:
The Snowflake table has the
ENABLE_SCHEMA_EVOLUTIONparameter set to TRUE.The COPY INTO <table> statement uses the
MATCH_BY_COLUMN_NAMEoption.The role used to load the data has the EVOLVE SCHEMA or OWNERSHIP privilege on the table.
Additionally, for schema evolution with CSV, when used with MATCH_BY_COLUMN_NAME and PARSE_HEADER, ERROR_ON_COLUMN_COUNT_MISMATCH must be set to false.
Schema evolution is a standalone feature but can be used in conjunction with the schema detection support for retrieving the column definitions from a set of files in cloud storage. In combination, these features enable continuous data pipelines to create new tables from a set of data files in cloud storage and then modify columns of the tables as the schema of new source data files evolves with column additions or deletions.
Usage notes¶
This feature supports Apache Avro, Apache Parquet, CSV, JSON, and ORC files.
This feature is limited to COPY INTO <table> statements and Snowpipe data loads. INSERT operations cannot evolve the target table schema automatically.
Snowpipe Streaming data loads using the Snowflake Ingest SDK directly are not supported with schema evolution. The Kafka connector with Snowpipe Streaming supports schema detection and evolution.
By default, this feature is limited to adding a maximum of 100 columns or evolving no more than 1 schema per COPY operation. To request more than 100 added columns or 1 schema per COPY operation, contact Snowflake Support.
There is no limit on dropping NOT NULL column constraints.
Schema evolution is tracked by the
SchemaEvolutionRecordoutput in the following views and commands: INFORMATION_SCHEMA COLUMNS View, ACCOUNT_USAGE COLUMNS View, DESCRIBE TABLE command, and SHOW COLUMNS command.However, for the Kafka connector with Snowpipe Streaming, schema evolution is not tracked by the
SchemaEvolutionRecordoutput. TheSchemaEvolutionRecordoutput always shows NULL.When a column is manually renamed or modified after a schema evolution, the schema evolution record will be cleared.
Schema evolution isn’t supported by tasks.
Schema evolution support: Ingestion method comparison¶
The specific metadata field SchemaEvolutionRecord is used to track schema evolution. You can view this field with the INFORMATION_SCHEMA.COLUMNS View, DESCRIBE TABLE command, and SHOW COLUMNS command.
The following table summarizes schema evolution support and the corresponding SchemaEvolutionRecord tracking behavior across different Snowflake ingestion methods:
Ingestion method |
Architecture or context |
Schema evolution support status |
SchemaEvolutionRecord tracking behavior |
|---|---|---|---|
File-based (batch/micro-batch) |
COPY INTO <table> command |
Fully supported |
Visible in tracking views/commands. |
File-based (batch/micro-batch) |
Snowpipe, using automated loading |
Fully supported |
Visible in tracking views or commands. |
Streaming at the row level |
Snowpipe Streaming (High-performance architecture) |
Fully supported |
Visible in tracking views or commands. |
Streaming at the row level |
Snowpipe Streaming with classic architecture; for example, Kafka connector |
Only the classic architecture with Kafka connector is supported, and tracking is limited. |
Always shows NULL in tracking views or commands. |
Examples¶
The following example creates a table with column definitions derived from a set of Parquet data. With automatic table schema evolution enabled for the table, further data loads from Parquet files with additional name/value pairs automatically add columns to the table:
Note that the mystage stage and my_parquet_format file format referenced in the statement must already exist. A set of files must
already be staged in the cloud storage location referenced in the stage definition.
This example builds on an example in the INFER_SCHEMA topic: