modin.pandas.to_iceberg¶
- modin.pandas.to_iceberg(obj: Union[DataFrame, Series], table_name: Union[str, Iterable[str]], *, iceberg_config: dict, mode: Optional[str] = None, column_order: str = 'index', clustering_keys: Optional[Iterable[Union[snowflake.snowpark.column.Column, str]]] = None, block: bool = True, comment: Optional[str] = None, enable_schema_evolution: Optional[bool] = None, data_retention_time: Optional[int] = None, max_data_extension_time: Optional[int] = None, change_tracking: Optional[bool] = None, copy_grants: bool = False, index: bool = True, index_label: Optional[Union[Hashable, Sequence[Hashable]]] = None) Optional[AsyncJob][source]¶
- Writes the given DataFrame or Series data to the specified iceberg table in a Snowflake database. - Parameters:
- obj – The object to create the iceberg table from. It must be either a Snowpark pandas DataFrame or Series. 
- table_name – A string or list of strings representing table name. If input is a string, it represents the table name; if input is of type iterable of strings, it represents the fully-qualified object identifier (database name, schema name, and table name). 
- iceberg_config – - A dictionary that can contain the following iceberg configuration values: - external_volume: specifies the identifier for the external volume where
- the Iceberg table stores its metadata files and data in Parquet format 
 
- catalog: specifies either Snowflake or a catalog integration to use for this table 
- base_location: the base directory that snowflake can write iceberg metadata and files to 
- catalog_sync: optionally sets the catalog integration configured for Polaris Catalog 
- storage_serialization_policy: specifies the storage serialization policy for the table 
 
- mode – - One of the following values. When it’s - Noneor not provided, the save mode set by- mode()is used.- ”append”: Append data of this DataFrame to the existing table. Creates a table if it does not exist. - ”overwrite”: Overwrite the existing table by dropping old table. - ”truncate”: Overwrite the existing table by truncating old table. - ”errorifexists”: Throw an exception if the table already exists. - ”ignore”: Ignore this operation if the table already exists. 
- column_order – - When - modeis “append”, data will be inserted into the target table by matching column sequence or column name. Default is “index”. When- modeis not “append”, the- column_ordermakes no difference.- ”index”: Data will be inserted into the target table by column sequence. “name”: Data will be inserted into the target table by matching column names. If the target table has more columns than the source DataFrame, use this one. 
- clustering_keys – Specifies one or more columns or column expressions in the table as the clustering key. See Clustering Keys & Clustered Tables for more details. 
- block – A bool value indicating whether this function will wait until the result is available. When it is - False, this function executes the underlying queries of the dataframe asynchronously and returns an- AsyncJob.
- comment – Adds a comment for the created table. See COMMENT. This argument is ignored if a table already exists and save mode is - appendor- truncate.
- enable_schema_evolution – Enables or disables automatic changes to the table schema from data loaded into the table from source files. Setting to - Trueenables automatic schema evolution and setting to- Falsedisables it. If not set, the default behavior is used.
- data_retention_time – Specifies the retention period for the table in days so that Time Travel actions (SELECT, CLONE, UNDROP) can be performed on historical data in the table. 
- max_data_extension_time – Specifies the maximum number of days for which Snowflake can extend the data retention period for the table to prevent streams on the table from becoming stale. 
- change_tracking – Specifies whether to enable change tracking for the table. If not set, the default behavior is used. 
- copy_grants – When true, retain the access privileges from the original table when a new table is created with “overwrite” mode. 
- index – default True If true, save DataFrame index columns as table columns. 
- index_label – Column label for index column(s). If None is given (default) and index is True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex. 
 
 - Example: - Saving DataFrame to an Iceberg table. Note that the external_volume, catalog, and base_location should have been setup externally. See `Create your first Iceberg table <https://docs.snowflake.com/en/user-guide/tutorials/create-your-first-iceberg-table>`_ for more information on creating iceberg resources. >>> df = session.create_dataframe([[1,2],[3,4]], schema=["a", "b"]) >>> iceberg_config = { ... "external_volume": "example_volume", ... "catalog": "example_catalog", ... "base_location": "/iceberg_root", ... "storage_serialization_policy": "OPTIMIZED", ... } >>> pd.to_iceberg(df.to_snowpark_pandas(), "my_table", iceberg_config=iceberg_config, mode="overwrite") # doctest: +SKIP