snowflake.snowpark.DataFrameWriter.saveAsTable¶

DataFrameWriter.saveAsTable(table_name: Union[str, Iterable[str]], *, mode: Optional[str] = None, column_order: str = 'index', create_temp_table: bool = False, table_type: Literal['', 'temp', 'temporary', 'transient'] = '', clustering_keys: Optional[Iterable[Union[Column, str]]] = None, statement_params: Optional[Dict[str, str]] = None, block: bool = True, comment: Optional[str] = None, enable_schema_evolution: Optional[bool] = None, data_retention_time: Optional[int] = None, max_data_extension_time: Optional[int] = None, change_tracking: Optional[bool] = None, copy_grants: bool = False, iceberg_config: Optional[Dict[str, Union[str, Iterable[Union[Column, str]]]]] = None, table_exists: Optional[bool] = None, overwrite_condition: Optional[Union[Column, str]] = None, **kwargs: Optional[Dict[str, Any]]) → Optional[AsyncJob][source]¶

Writes the data to the specified table in a Snowflake database.

Parameters:

table_name – A string or list of strings representing table name. If input is a string, it represents the table name; if input is of type iterable of strings, it represents the fully-qualified object identifier (database name, schema name, and table name).
mode –
One of the following values. When it’s None or not provided, the save mode set by mode() is used.

”append”: Append data of this DataFrame to the existing table. Creates a table if it does not exist.

”overwrite”: Overwrite the existing table. By default, drops and recreates the table.
When overwrite_condition is specified, performs selective overwrite: deletes only rows matching the condition, then inserts new data.

”truncate”: Overwrite the existing table by truncating old table.

”errorifexists”: Throw an exception if the table already exists.

”ignore”: Ignore this operation if the table already exists.
column_order –
When mode is “append”, data will be inserted into the target table by matching column sequence or column name. Default is “index”. When mode is not “append”, the column_order makes no difference.

”index”: Data will be inserted into the target table by column sequence. “name”: Data will be inserted into the target table by matching column names. If the target table has more columns than the source DataFrame, use this one.
create_temp_table – (Deprecated) The to-be-created table will be temporary if this is set to True.
table_type – The table type of table to be created. The supported values are: temp, temporary, and transient. An empty string means to create a permanent table. Not applicable for iceberg tables. Learn more about table types here.
clustering_keys – Specifies one or more columns or column expressions in the table as the clustering key. See Clustering Keys & Clustered Tables for more details.
comment – Adds a comment for the created table. See COMMENT. This argument is ignored if a table already exists and save mode is append or truncate.
enable_schema_evolution – Enables or disables automatic changes to the table schema from data loaded into the table from source files. Setting to True enables automatic schema evolution and setting to False disables it. If not set, the default behavior is used.
data_retention_time – Specifies the retention period for the table in days so that Time Travel actions (SELECT, CLONE, UNDROP) can be performed on historical data in the table.
max_data_extension_time – Specifies the maximum number of days for which Snowflake can extend the data retention period for the table to prevent streams on the table from becoming stale.
change_tracking – Specifies whether to enable change tracking for the table. If not set, the default behavior is used.
copy_grants – When true, retain the access privileges from the original table when a new table is created with “overwrite” mode.
statement_params – Dictionary of statement level parameters to be set while executing this action.
block – A bool value indicating whether this function will wait until the result is available. When it is False, this function executes the underlying queries of the dataframe asynchronously and returns an AsyncJob.
iceberg_config –
A dictionary that can contain the following iceberg configuration values:
- partition_by: specifies one or more partition expressions for the Iceberg table.
  Can be a single Column, column name, SQL expression string, or a list of these. Supports identity partitioning (column names) as well as partition transform functions like bucket(), truncate(), year(), month(), day(), hour().
- external_volume: specifies the identifier for the external volume where
  the Iceberg table stores its metadata files and data in Parquet format
- catalog: specifies either Snowflake or a catalog integration to use for this table
- base_location: the base directory that snowflake can write iceberg metadata and files to
- target_file_size: specifies a target Parquet file size for the table.
  Valid values: ‘AUTO’ (default), ‘16MB’, ‘32MB’, ‘64MB’, ‘128MB’
- catalog_sync: optionally sets the catalog integration configured for Polaris Catalog
- storage_serialization_policy: specifies the storage serialization policy for the table
- iceberg_version: Overrides the version of iceberg to use. Defaults to 2 when unset.
table_exists – Optional parameter to specify if the table is known to exist or not. Set to True if table exists, False if it doesn’t, or None (default) for automatic detection. Primarily useful for “append”, “truncate”, and “overwrite” with overwrite_condition modes to avoid running query for automatic detection.
overwrite_condition – Specifies the overwrite condition to perform atomic targeted delete-insert. Can only be used when mode is “overwrite”. When provided and the table exists, rows matching the condition are atomically deleted and all rows from the DataFrame are inserted, preserving non-matching rows. When not provided, the default “overwrite” behavior applies (drop and recreate table). If the table does not exist, overwrite_condition is ignored and the table is created normally.

Example 1:

Basic table saves

>>> df = session.create_dataframe([[1,2],[3,4]], schema=["a", "b"])
>>> df.write.mode("overwrite").save_as_table("my_table", table_type="temporary")
>>> session.table("my_table").collect()
[Row(A=1, B=2), Row(A=3, B=4)]
>>> df.write.save_as_table("my_table", mode="append", table_type="temporary")
>>> session.table("my_table").collect()
[Row(A=1, B=2), Row(A=3, B=4), Row(A=1, B=2), Row(A=3, B=4)]
>>> df.write.mode("overwrite").save_as_table("my_transient_table", table_type="transient")
>>> session.table("my_transient_table").collect()
[Row(A=1, B=2), Row(A=3, B=4)]

Example 2:

Saving DataFrame to an Iceberg table. Note that the external_volume, catalog, and base_location should have been setup externally.
See `Create your first Iceberg table <https://docs.snowflake.com/en/user-guide/tutorials/create-your-first-iceberg-table>`_ for more information on creating iceberg resources.

>>> df = session.create_dataframe([[1,2],[3,4]], schema=["a", "b"])
>>> from snowflake.snowpark.functions import col, bucket
>>> iceberg_config = {
...     "external_volume": "example_volume",
...     "catalog": "example_catalog",
...     "base_location": "/iceberg_root",
...     "storage_serialization_policy": "OPTIMIZED",
...     "target_file_size": "128MB",
...     "partition_by": ["a", bucket(3, col("b"))],
... }
>>> df.write.mode("overwrite").save_as_table("my_table", iceberg_config=iceberg_config) # doctest: +SKIP

Example 3:

Using overwrite_condition for targeted delete and insert:

>>> from snowflake.snowpark.functions import col
>>> df = session.create_dataframe([[1, "a"], [2, "b"], [3, "c"]], schema=["id", "val"])
>>> df.write.mode("overwrite").save_as_table("my_table", table_type="temporary")
>>> session.table("my_table").order_by("id").collect()
[Row(ID=1, VAL='a'), Row(ID=2, VAL='b'), Row(ID=3, VAL='c')]

>>> new_df = session.create_dataframe([[2, "updated2"], [5, "updated5"]], schema=["id", "val"])
>>> new_df.write.mode("overwrite").save_as_table("my_table", overwrite_condition="id = 1 or val = 'b'")
>>> session.table("my_table").order_by("id").collect()
[Row(ID=2, VAL='updated2'), Row(ID=3, VAL='c'), Row(ID=5, VAL='updated5')]