You are viewing documentation about an older version (1.6.1). View latest version


Table.update(assignments: Dict[str, Union[Column, None, bool, int, float, str, bytearray, Decimal, date, datetime, time, bytes, list, tuple, dict]], condition: Optional[Column] = None, source: Optional[DataFrame] = None, *, statement_params: Optional[Dict[str, str]] = None, block: bool = True) β†’ UpdateResult[source]ΒΆ
Table.update(assignments: Dict[str, Union[Column, None, bool, int, float, str, bytearray, Decimal, date, datetime, time, bytes, list, tuple, dict]], condition: Optional[Column] = None, source: Optional[DataFrame] = None, *, statement_params: Optional[Dict[str, str]] = None, block: bool = False) β†’ AsyncJob

Updates rows in the Table with specified assignments and returns a UpdateResult, representing the number of rows modified and the number of multi-joined rows modified.

  • assignments – A dict that associates the names of columns with the values that should be updated. The value of assignments can either be a literal value or a Column object.

  • condition – An optional Column object representing the specified condition. It must be provided if source is provided.

  • source – An optional DataFrame that is included in condition. It can also be another Table.

  • statement_params – Dictionary of statement level parameters to be set while executing this action.

  • block – A bool value indicating whether this function will wait until the result is available. When it is False, this function executes the underlying queries of the dataframe asynchronously and returns an AsyncJob.


>>> target_df = session.create_dataframe([(1, 1),(1, 2),(2, 1),(2, 2),(3, 1),(3, 2)], schema=["a", "b"])
>>> target_df.write.save_as_table("my_table", mode="overwrite", table_type="temporary")
>>> t = session.table("my_table")

>>> # update all rows in column "b" to 0 and all rows in column "a"
>>> # to the summation of column "a" and column "b"
>>> t.update({"b": 0, "a": t.a + t.b})
UpdateResult(rows_updated=6, multi_joined_rows_updated=0)
>>> t.sort("a", "b").collect()
[Row(A=2, B=0), Row(A=3, B=0), Row(A=3, B=0), Row(A=4, B=0), Row(A=4, B=0), Row(A=5, B=0)]

>>> # update all rows in column "b" to 0 where column "a" has value 1
>>> target_df.write.save_as_table("my_table", mode="overwrite", table_type="temporary")
>>> t.update({"b": 0}, t["a"] == 1)
UpdateResult(rows_updated=2, multi_joined_rows_updated=0)
>>> t.sort("a", "b").collect()
[Row(A=1, B=0), Row(A=1, B=0), Row(A=2, B=1), Row(A=2, B=2), Row(A=3, B=1), Row(A=3, B=2)]

>>> # update all rows in column "b" to 0 where column "a" in this
>>> # table is equal to column "a" in another dataframe
>>> target_df.write.save_as_table("my_table", mode="overwrite", table_type="temporary")
>>> source_df = session.create_dataframe([1, 2, 3, 4], schema=["a"])
>>> t.update({"b": 0}, t["a"] == source_df.a, source_df)
UpdateResult(rows_updated=6, multi_joined_rows_updated=0)
>>> t.sort("a", "b").collect()
[Row(A=1, B=0), Row(A=1, B=0), Row(A=2, B=0), Row(A=2, B=0), Row(A=3, B=0), Row(A=3, B=0)]