snowflake.snowpark.DataFrameWriter.copy_into_location¶
- DataFrameWriter.copy_into_location(location: str, *, partition_by: ColumnOrSqlExpr | None = None, file_format_name: str | None = None, file_format_type: str | None = None, format_type_options: Dict[str, str] | None = None, header: bool = False, statement_params: Dict[str, str] | None = None, block: bool = True, **copy_options: str | None) List[Row] [source]¶
- DataFrameWriter.copy_into_location(location: str, *, partition_by: ColumnOrSqlExpr | None = None, file_format_name: str | None = None, file_format_type: str | None = None, format_type_options: Dict[str, str] | None = None, header: bool = False, statement_params: Dict[str, str] | None = None, block: bool = False, **copy_options: str | None) AsyncJob
Executes a COPY INTO <location> to unload data from a
DataFrame
into one or more files in a stage or external stage.- Parameters:
location – The destination stage location.
partition_by – Specifies an expression used to partition the unloaded table rows into separate files. It can be a
Column
, a column name, or a SQL expression.file_format_name – Specifies an existing named file format to use for unloading data from the table. The named file format determines the format type (CSV, JSON, PARQUET), as well as any other format options, for the data files.
file_format_type – Specifies the type of files unloaded from the table. If a format type is specified, additional format-specific options can be specified in
format_type_options
.format_type_options – Depending on the
file_format_type
specified, you can include more format specific options. Use the options documented in the Format Type Options.header – Specifies whether to include the table column headings in the output files.
statement_params – Dictionary of statement level parameters to be set while executing this action.
copy_options – The kwargs that are used to specify the copy options. Use the options documented in the Copy Options.
block – (Experimental) A bool value indicating whether this function will wait until the result is available. When it is
False
, this function executes the underlying queries of the dataframe asynchronously and returns anAsyncJob
.
- Returns:
A list of
Row
objects containing unloading results.
Example:
>>> # save this dataframe to a parquet file on the session stage >>> df = session.create_dataframe([["John", "Berry"], ["Rick", "Berry"], ["Anthony", "Davis"]], schema = ["FIRST_NAME", "LAST_NAME"]) >>> remote_file_path = f"{session.get_session_stage()}/names.parquet" >>> copy_result = df.write.copy_into_location(remote_file_path, file_format_type="parquet", header=True, overwrite=True, single=True) >>> copy_result[0].rows_unloaded 3 >>> # the following code snippet just verifies the file content and is actually irrelevant to Snowpark >>> # download this file and read it using pyarrow >>> import os >>> import tempfile >>> import pyarrow.parquet as pq >>> with tempfile.TemporaryDirectory() as tmpdirname: ... _ = session.file.get(remote_file_path, tmpdirname) ... pq.read_table(os.path.join(tmpdirname, "names.parquet")) pyarrow.Table FIRST_NAME: string not null LAST_NAME: string not null ---- FIRST_NAME: [["John","Rick","Anthony"]] LAST_NAME: [["Berry","Berry","Davis"]]