snowflake.snowpark.Session.write_pandasΒΆ
- Session.write_pandas(df: Union[pandas.DataFrame, modin.pandas.DataFrame, modin.pandas.Series], table_name: str, *, database: Optional[str] = None, schema: Optional[str] = None, chunk_size: Optional[int] = None, compression: str = 'gzip', on_error: str = 'abort_statement', parallel: int = 4, quote_identifiers: bool = True, auto_create_table: bool = False, create_temp_table: bool = False, overwrite: bool = False, table_type: Literal['', 'temp', 'temporary', 'transient'] = '', use_logical_type: Optional[bool] = None, **kwargs: Dict[str, Any]) Table [source]ΒΆ
Writes a pandas DataFrame to a table in Snowflake and returns a Snowpark
DataFrame
object referring to the table where the pandas DataFrame was written to.- Parameters:
df β The pandas DataFrame or Snowpark pandas DataFrame or Series weβd like to write back.
table_name β Name of the table we want to insert into.
database β Database that the table is in. If not provided, the default one will be used.
schema β Schema that the table is in. If not provided, the default one will be used.
chunk_size β Number of rows to be inserted once. If not provided, all rows will be dumped once. Default to None normally, 100,000 if inside a stored procedure.
compression β The compression used on the Parquet files: gzip or snappy. Gzip gives supposedly a better compression, while snappy is faster. Use whichever is more appropriate.
on_error β Action to take when COPY INTO statements fail. See details at copy options.
parallel β Number of threads to be used when uploading chunks. See details at parallel parameter.
quote_identifiers β By default, identifiers, specifically database, schema, table and column names (from
DataFrame.columns
) will be quoted. If set toFalse
, identifiers are passed on to Snowflake without quoting, i.e. identifiers will be coerced to uppercase by Snowflake.auto_create_table β When true, automatically creates a table to store the passed in pandas DataFrame using the passed in
database
,schema
, andtable_name
. Note: there are usually multiple table configurations that would allow you to upload a particular pandas DataFrame successfully. If you donβt like the auto created table, you can always create your own table before calling this function. For example, auto-created tables will storelist
,tuple
anddict
as strings in a VARCHAR column.create_temp_table β (Deprecated) The to-be-created table will be temporary if this is set to
True
. Note that to avoid breaking changes, currently when this is set to True, it overridestable_type
.overwrite β Default value is
False
and the pandas DataFrame data is appended to the existing table. If set toTrue
and if auto_create_table is also set toTrue
, then it drops the table. If set toTrue
and if auto_create_table is set toFalse
, then it truncates the table. Note that in both cases (when overwrite is set toTrue
) it will replace the existing contents of the table with that of the passed in pandas DataFrame.table_type β The table type of table to be created. The supported values are:
temp
,temporary
, andtransient
. An empty string means to create a permanent table. Learn more about table types here.use_logical_type β Boolean that specifies whether to use Parquet logical types when reading the parquet files for the uploaded pandas dataframe. With this file format option, Snowflake can interpret Parquet logical types during data loading. To enable Parquet logical types, set use_logical_type as True. Set to None to use Snowflakes default. For more information, see: file format options:.
Example:
>>> import pandas as pd >>> pandas_df = pd.DataFrame([(1, "Steve"), (2, "Bob")], columns=["id", "name"]) >>> snowpark_df = session.write_pandas(pandas_df, "write_pandas_table", auto_create_table=True, table_type="temp") >>> snowpark_df.sort('"id"').to_pandas() id name 0 1 Steve 1 2 Bob >>> pandas_df2 = pd.DataFrame([(3, "John")], columns=["id", "name"]) >>> snowpark_df2 = session.write_pandas(pandas_df2, "write_pandas_table", auto_create_table=False) >>> snowpark_df2.sort('"id"').to_pandas() id name 0 1 Steve 1 2 Bob 2 3 John >>> pandas_df3 = pd.DataFrame([(1, "Jane")], columns=["id", "name"]) >>> snowpark_df3 = session.write_pandas(pandas_df3, "write_pandas_table", auto_create_table=False, overwrite=True) >>> snowpark_df3.to_pandas() id name 0 1 Jane >>> pandas_df4 = pd.DataFrame([(1, "Jane")], columns=["id", "name"]) >>> snowpark_df4 = session.write_pandas(pandas_df4, "write_pandas_transient_table", auto_create_table=True, table_type="transient") >>> snowpark_df4.to_pandas() id name 0 1 Jane
Note
Unless
auto_create_table
isTrue
, you must first create a table in Snowflake that the passed in pandas DataFrame can be written to. If your pandas DataFrame cannot be written to the specified table, an exception will be raised.If the dataframe is Snowpark pandas
DataFrame
orSeries
, it will callmodin.pandas.DataFrame.to_snowflake
ormodin.pandas.Series.to_snowflake
internally to write a Snowpark pandas DataFrame into a Snowflake table.