snowflake.snowpark.DataFrame.copy_into_table¶
- DataFrame.copy_into_table(table_name: Union[str, Iterable[str]], *, files: Optional[Iterable[str]] = None, pattern: Optional[str] = None, validation_mode: Optional[str] = None, target_columns: Optional[Iterable[str]] = None, transformations: Optional[Iterable[Union[Column, str]]] = None, format_type_options: Optional[Dict[str, Any]] = None, statement_params: Optional[Dict[str, str]] = None, iceberg_config: Optional[dict] = None, **copy_options: Any) List[Row] [source]¶
Executes a COPY INTO <table> command to load data from files in a stage location into a specified table.
It returns the load result described in OUTPUT section of the COPY INTO <table> command. The returned result also depends on the value of
validation_mode
.It’s slightly different from the
COPY INTO
command in that this method will automatically create a table if the table doesn’t exist and the input files are CSV files whereas theCOPY INTO <table>
doesn’t.To call this method, this DataFrame must be created from a
DataFrameReader
.Example:
>>> # Create a CSV file to demo load >>> import tempfile >>> with tempfile.NamedTemporaryFile(mode="w+t") as t: ... t.writelines(["id1, Product A", "\n" "id2, Product B"]) ... t.flush() ... create_stage_result = session.sql("create temp stage if not exists test_stage").collect() ... put_result = session.file.put(t.name, "@test_stage/copy_into_table_dir", overwrite=True) >>> # user_schema is used to read from CSV files. For other files it's not needed. >>> from snowflake.snowpark.types import StringType, StructField, StringType >>> from snowflake.snowpark.functions import length >>> user_schema = StructType([StructField("product_id", StringType()), StructField("product_name", StringType())]) >>> # Use the DataFrameReader (session.read below) to read from CSV files. >>> df = session.read.schema(user_schema).csv("@test_stage/copy_into_table_dir") >>> # specify target column names. >>> target_column_names = ["product_id", "product_name"] >>> drop_result = session.sql("drop table if exists copied_into_table").collect() # The copy will recreate the table. >>> copied_into_result = df.copy_into_table("copied_into_table", target_columns=target_column_names, force=True) >>> session.table("copied_into_table").show() --------------------------------- |"PRODUCT_ID" |"PRODUCT_NAME" | --------------------------------- |id1 | Product A | |id2 | Product B | ---------------------------------
The arguments of this function match the optional parameters of the COPY INTO <table>.
- Parameters:
table_name – A string or list of strings representing table name. If input is a string, it represents the table name; if input is of type iterable of strings, it represents the fully-qualified object identifier (database name, schema name, and table name).
files – Specific files to load from the stage location.
pattern – The regular expression that is used to match file names of the stage location.
validation_mode – A
str
that instructs theCOPY INTO <table>
command to validate the data files instead of loading them into the specified table. Values can be “RETURN_n_ROWS”, “RETURN_ERRORS”, or “RETURN_ALL_ERRORS”. Refer to the above mentionedCOPY INTO <table>
command optional parameters for more details.target_columns – Name of the columns in the table where the data should be saved.
transformations – A list of column transformations.
format_type_options – A dict that contains the
formatTypeOptions
of theCOPY INTO <table>
command.statement_params – Dictionary of statement level parameters to be set while executing this action.
iceberg_config –
A dictionary that can contain the following iceberg configuration values:
- external_volume: specifies the identifier for the external volume where
the Iceberg table stores its metadata files and data in Parquet format
catalog: specifies either Snowflake or a catalog integration to use for this table
base_location: the base directory that snowflake can write iceberg metadata and files to
catalog_sync: optionally sets the catalog integration configured for Polaris Catalog
storage_serialization_policy: specifies the storage serialization policy for the table
copy_options – The kwargs that is used to specify the
copyOptions
of theCOPY INTO <table>
command.