You are viewing documentation about an older version (1.3.0). View latest version

snowflake.snowpark.DataFrame.with_columns¶

DataFrame.with_columns(col_names: List[str], values: List[Column | TableFunctionCall]) → DataFrame[source]¶

Returns a DataFrame with additional columns with the specified names col_names. The columns are computed by using the specified expressions values.

If columns with the same names already exist in the DataFrame, those columns are removed and appended at the end by new columns.

Example 1:

>>> from snowflake.snowpark.functions import udtf
>>> @udtf(output_schema=["number"])
... class sum_udtf:
...     def process(self, a: int, b: int) -> Iterable[Tuple[int]]:
...         yield (a + b, )
>>> df = session.create_dataframe([[1, 2], [3, 4]], schema=["a", "b"])
>>> df.with_columns(["mean", "total"], [(df["a"] + df["b"]) / 2, sum_udtf(df.a, df.b)]).sort(df.a).show()
----------------------------------
|"A"  |"B"  |"MEAN"    |"TOTAL"  |
----------------------------------
|1    |2    |1.500000  |3        |
|3    |4    |3.500000  |7        |
----------------------------------
Copy

Example 2:

>>> from snowflake.snowpark.functions import table_function
>>> split_to_table = table_function("split_to_table")
>>> df = session.sql("select 'James' as name, 'address1 address2 address3' as addresses")
>>> df.with_columns(["seq", "idx", "val"], [split_to_table(df.addresses, lit(" "))]).show()
------------------------------------------------------------------
|"NAME"  |"ADDRESSES"                 |"SEQ"  |"IDX"  |"VAL"     |
------------------------------------------------------------------
|James   |address1 address2 address3  |1      |1      |address1  |
|James   |address1 address2 address3  |1      |2      |address2  |
|James   |address1 address2 address3  |1      |3      |address3  |
------------------------------------------------------------------
Copy
Parameters: