You are viewing documentation about an older version (1.12.1). View latest version

snowflake.snowpark.DataFrameWriter.copy_into_location

DataFrameWriter.copy_into_location(location: str, *, partition_by: Optional[Union[snowflake.snowpark.column.Column, str]] = None, file_format_name: Optional[str] = None, file_format_type: Optional[str] = None, format_type_options: Optional[Dict[str, str]] = None, header: bool = False, statement_params: Optional[Dict[str, str]] = None, block: bool = True, **copy_options: Optional[str]) List[Row][source]
DataFrameWriter.copy_into_location(location: str, *, partition_by: Optional[Union[snowflake.snowpark.column.Column, str]] = None, file_format_name: Optional[str] = None, file_format_type: Optional[str] = None, format_type_options: Optional[Dict[str, str]] = None, header: bool = False, statement_params: Optional[Dict[str, str]] = None, block: bool = False, **copy_options: Optional[str]) AsyncJob

Executes a COPY INTO <location> to unload data from a DataFrame into one or more files in a stage or external stage.

Parameters:
  • location – The destination stage location.

  • partition_by – Specifies an expression used to partition the unloaded table rows into separate files. It can be a Column, a column name, or a SQL expression.

  • file_format_name – Specifies an existing named file format to use for unloading data from the table. The named file format determines the format type (CSV, JSON, PARQUET), as well as any other format options, for the data files.

  • file_format_type – Specifies the type of files unloaded from the table. If a format type is specified, additional format-specific options can be specified in format_type_options.

  • format_type_options – Depending on the file_format_type specified, you can include more format specific options. Use the options documented in the Format Type Options.

  • header – Specifies whether to include the table column headings in the output files.

  • statement_params – Dictionary of statement level parameters to be set while executing this action.

  • copy_options – The kwargs that are used to specify the copy options. Use the options documented in the Copy Options.

  • block – A bool value indicating whether this function will wait until the result is available. When it is False, this function executes the underlying queries of the dataframe asynchronously and returns an AsyncJob.

Returns:

A list of Row objects containing unloading results.

Example:

>>> # save this dataframe to a parquet file on the session stage
>>> df = session.create_dataframe([["John", "Berry"], ["Rick", "Berry"], ["Anthony", "Davis"]], schema = ["FIRST_NAME", "LAST_NAME"])
>>> remote_file_path = f"{session.get_session_stage()}/names.parquet"
>>> copy_result = df.write.copy_into_location(remote_file_path, file_format_type="parquet", header=True, overwrite=True, single=True)
>>> copy_result[0].rows_unloaded
3
>>> # the following code snippet just verifies the file content and is actually irrelevant to Snowpark
>>> # download this file and read it using pyarrow
>>> import os
>>> import tempfile
>>> import pyarrow.parquet as pq
>>> with tempfile.TemporaryDirectory() as tmpdirname:
...     _ = session.file.get(remote_file_path, tmpdirname)
...     pq.read_table(os.path.join(tmpdirname, "names.parquet"))
pyarrow.Table
FIRST_NAME: string not null
LAST_NAME: string not null
----
FIRST_NAME: [["John","Rick","Anthony"]]
LAST_NAME: [["Berry","Berry","Davis"]]
Copy