You are viewing documentation about an older version (1.2.0). View latest version

snowflake.snowpark.AsyncJob

class snowflake.snowpark.AsyncJob(query_id: str, query: str | None, session: Session, result_type: _AsyncResultType = _AsyncResultType.ROW, post_actions: List[Query] | None = None, **kwargs)[source]

Bases: object

Provides a way to track an asynchronous query in Snowflake. A DataFrame object can be evaluated asynchronously and an AsyncJob object will be returned. With this instance, you can:

  • retrieve results;

  • check the query status (still running or done);

  • cancel the running query;

  • retrieve the query ID and perform other operations on this query ID manually.

AsyncJob can be created by Session.create_async_job() or action methods in DataFrame and other classes. All methods in DataFrame with a suffix of _nowait execute asynchronously and create an AsyncJob instance. They are also equivalent to corresponding functions in DataFrame and other classes that set block=False. Therefore, to use it, you need to create a dataframe first. Here we demonstrate how to do that:

First, we create a dataframe:
>>> from snowflake.snowpark.functions import when_matched, when_not_matched
>>> df = session.create_dataframe([[float(4), 3, 5], [2.0, -4, 7], [3.0, 5, 6],[4.0,6,8]], schema=["a", "b", "c"])
Copy
Example 1

DataFrame.collect() can be performed asynchronously:

>>> async_job = df.collect_nowait()
>>> async_job.result()
[Row(A=4.0, B=3, C=5), Row(A=2.0, B=-4, C=7), Row(A=3.0, B=5, C=6), Row(A=4.0, B=6, C=8)]
Copy

You can also do:

>>> async_job = df.collect(block=False)
>>> async_job.result()
[Row(A=4.0, B=3, C=5), Row(A=2.0, B=-4, C=7), Row(A=3.0, B=5, C=6), Row(A=4.0, B=6, C=8)]
Copy
Example 2

DataFrame.to_pandas() can be performed asynchronously:

>>> async_job = df.to_pandas(block=False)
>>> async_job.result()
     A  B  C
0  4.0  3  5
1  2.0 -4  7
2  3.0  5  6
3  4.0  6  8
Copy
Example 3

DataFrame.first() can be performed asynchronously:

>>> async_job = df.first(block=False)
>>> async_job.result()
[Row(A=4.0, B=3, C=5)]
Copy
Example 4

DataFrame.count() can be performed asynchronously:

>>> async_job = df.count(block=False)
>>> async_job.result()
4
Copy
Example 5

Save a dataframe to table or copy it into a stage file can also be performed asynchronously:

>>> table_name = "name"
>>> async_job = df.write.save_as_table(table_name, block=False)
>>> # copy into a stage file
>>> remote_location = f"{session.get_session_stage()}/name.csv"
>>> async_job = df.write.copy_into_location(remote_location, block=False)
>>> async_job.result()[0]['rows_unloaded']
4
Copy
Example 7

Table.merge(), Table.update(), Table.delete() can also be performed asynchronously:

>>> target_df = session.create_dataframe([(10, "old"), (10, "too_old"), (11, "old")], schema=["key", "value"])
>>> target_df.write.save_as_table("my_table", mode="overwrite", table_type="temporary")
>>> target = session.table("my_table")
>>> source = session.create_dataframe([(10, "new"), (12, "new"), (13, "old")], schema=["key", "value"])
>>> async_job = target.merge(source,target["key"] == source["key"],[when_matched().update({"value": source["value"]}),when_not_matched().insert({"key": source["key"]})],block=False)
>>> async_job.result()
MergeResult(rows_inserted=2, rows_updated=2, rows_deleted=0)
Copy
Example 8

Cancel the running query associated with the dataframe:

>>> df = session.sql("select SYSTEM$WAIT(3)")
>>> async_job = df.collect_nowait()
>>> async_job.cancel()
Copy
Example 9

Executing two queries asynchronously is faster than executing two queries one by one:

>>> from time import time
>>> df1 = session.sql("select SYSTEM$WAIT(3)")
>>> df2 = session.sql("select SYSTEM$WAIT(3)")
>>> start = time()
>>> sync_res1 = df1.collect()
>>> sync_res2 = df2.collect()
>>> time1 = time() - start
>>> start = time()
>>> async_job1 = df1.collect_nowait()
>>> async_job2 = df2.collect_nowait()
>>> async_res1 = async_job1.result()
>>> async_res2 = async_job2.result()
>>> time2 = time() - start
>>> time2 < time1
True
Copy
Example 10

Creating an AsyncJob from an existing query ID, retrieving results and converting it back to a DataFrame:

>>> from snowflake.snowpark.functions import col
>>> query_id = session.sql("select 1 as A, 2 as B, 3 as C").collect_nowait().query_id
>>> async_job = session.create_async_job(query_id)
>>> async_job.query
'select 1 as A, 2 as B, 3 as C'
>>> async_job.result()
[Row(A=1, B=2, C=3)]
>>> async_job.result(result_type="pandas")
   A  B  C
0  1  2  3
>>> df = async_job.to_df()
>>> df.select(col("A").as_("D"), "B").collect()
[Row(D=1, B=2)]
Copy

Note

  • If a dataframe is associated with multiple queries:

    • if you use Session.create_dataframe() to create a dataframe from a large amount of local data and evaluate this dataframe asynchronously, data will still be loaded into Snowflake synchronously, and only fetching data from Snowflake again will be performed asynchronously.

    • otherwise, multiple queries will be wrapped into a Snowflake Anonymous Block and executed asynchronously as one query.

  • Temporary objects (e.g., tables) might be created when evaluating dataframes and they will be dropped automatically after all queries finish when calling a synchronous API. When you evaluate dataframes asynchronously, temporary objects will only be dropped after calling result().

  • This feature is currently not supported in Snowflake Python stored procedures.

Methods

cancel()

Cancels the query associated with this instance.

is_done()

Checks the status of the query associated with this instance and returns a bool value indicating whether the query has finished.

result([result_type])

Blocks and waits until the query associated with this instance finishes, then returns query results.

to_df()

Returns a DataFrame built from the result of this asynchronous job.

Attributes

query

The SQL text of of the executed query.

query_id

The query ID of the executed query