Create a sequence of tasks with a task graph¶
In Snowflake, you can manage multiple tasks with a task graph, also known as a directed acyclic graph (DAG). A task graph is composed of a root task and dependent child tasks. The dependencies must run in a start-to-finish direction, with no loops. An optional final task, called a finalizer, can perform cleanup operations after all other tasks are complete.
Build task graphs that have dynamic behavior by specifying logic-based operations in the task body using runtime values, graph level configuration, and return values of parent tasks.
You can create tasks and task graphs using supported languages and tools like SQL, JavaScript, Python, Java, Scala, or Snowflake Scripting. This topic provides SQL examples. For Python examples, see Managing Snowflake tasks and task graphs with Python.
You can also use Snowsight to manage and view your task graphs. For more information, see View tasks and task graphs in Snowsight.
Create a task graph¶
Create a root task using CREATE TASK, then create child tasks using CREATE TASK .. AFTER to select the parent tasks.
The root task defines when the task graph runs. Child tasks are executed in the order defined by the task graph.
When multiple child tasks have the same parent, the child tasks run in parallel.
When a task has multiple parents, the task waits for all preceding tasks to successfully complete before starting. (The task may also run when some parent tasks are skipped. For more information, see Skip or suspend a child task).
The following example creates a serverless task graph that starts with a root task that is scheduled to run every minute. The root task has two child tasks that run in parallel. (The diagram shows an example where one of these tasks runs longer than the other.) After both of those tasks complete, a third child task runs. The finalizer task runs after all other tasks complete or fail to complete:
Considerations:
A task graph is limited to a maximum of 1000 tasks.
A single task can have a maximum of 100 parent tasks and 100 child tasks.
When tasks run in parallel on the same user-managed warehouse, the compute resources must be sized to handle the concurrent task runs.
Finalizer task¶
You can add an optional finalizer task to run after all other tasks in the task graph complete (or fail to complete). Use this to do the following:
Perform cleanup operations, for example, cleaning up intermediate data that is no longer needed.
Send notifications about task success or failure.
To create a finalizer task, use CREATE TASK … FINALIZE … on the root task. Example:
Considerations:
A finalizer task is always associated with a root task. Each root task can have only one finalizer task, and a finalizer task can be associated with only one root task.
When the root task of a task graph is skipped (for example, because of overlap task graph runs), the finalizer task won’t be started.
A finalizer task cannot have any child tasks.
A finalizer task is scheduled only when no other tasks are running or queued in the current task graph run.
For more examples, see Finalizer task example: Send email notification and Finalizer task example: Correct for errors.
Manage task graph ownership¶
All tasks in a task graph must have the same task owner and be stored in the same database and schema.
You can transfer ownership of all tasks in a task graph using one of the following actions:
Drop the owner of all tasks in the task graph using DROP ROLE. Snowflake transfers ownership to the role that runs the DROP ROLE command.
Transfer ownership of all tasks in the task graph using GRANT OWNERSHIP on all tasks in a schema.
When you transfer ownership of the tasks in a task graph using these methods, the tasks in the task graph retain their relationships to each other.
Transferring ownership of a single task removes the dependency between the task and any parent and child tasks. For more information, see Unlink parent and child tasks (in this topic).
Note
Database replication does not work for task graphs if the graph is owned by a different role than the role that performs replication.
Run or schedule tasks in a task graph¶
Run a task graph manually¶
You can run a single instance of a task graph. This is useful for testing new or modified task graphs before enabling the task graph in production, or for one-time runs as needed.
Before starting the task graph, use ALTER TASK … RESUME on each child task (including the optional finalizer task) that you want to include in the run.
To run a single instance of a task graph, use EXECUTE TASK on the root task. When you run the root task, all resumed child tasks in the task graph are executed in the order defined by the task graph.
Run a task on a schedule or as a triggered task¶
In the root task, define when the task graph runs. Task graphs can run on a recurring schedule, or they can be triggered by an event. For more information, see the following topics:
To start the task graph, you can do either of the following:
Resume each individual child task (including the finalizer) that you want to include in the run, and then resume the root task, using ALTER TASK … RESUME.
Resume all of the tasks in a task graph at once using SYSTEM$TASK_DEPENDENTS_ENABLE ( <root_task_name> ) on the root task.
View dependent tasks in a task graph¶
To view the child tasks for a root task, call the TASK_DEPENDENTS table function. To retrieve all tasks in a task graph, input the root task when calling the function.
You can also use Snowsight to manage and view your task graphs. For more information, see View tasks and task graphs in Snowsight.
Modify, suspend, or retry tasks¶
Modify a task in a task graph¶
To modify a task in a scheduled task graph, suspend the root task using ALTER TASK … SUSPEND. If a run of the task graph is in process, it completes the current run. All future scheduled runs of the root task are canceled.
When the root task is suspended, child tasks including the finalizer task retain their state (suspended, running, or completed). The child tasks don’t need to be individually suspended.
After you suspend the root task, you can modify any task in the task graph.
To resume the task graph, you can do either of the following:
Resume the root task using ALTER TASK … RESUME. Individual child tasks that were running before do not need to be resumed.
Resume all of the tasks in a task graph at once by calling SYSTEM$TASK_DEPENDENTS_ENABLE and passing in the name of the root task.
Skip or suspend a child task¶
To skip a child task in a task graph, suspend the child task using ALTER TASK … SUSPEND.
When you suspend a child task, the task graph continues to run as though the child task had succeeded. A child task with multiple predecessors runs as long as at least one of the predecessors is in a resumed state, and all resumed predecessors run successfully to completion.
Retry a failed task¶
Use EXECUTE TASK … RETRY LAST to attempt to run the task graph from the last failed task. If the task succeeds, all child tasks will continue to run as their preceding tasks complete.
Automatic retries¶
By default, if a child task fails, the entire task graph is considered to have failed.
Rather than waiting until the next scheduled task graph run, you can instruct the task graph to retry immediately by setting the TASK_AUTO_RETRY_ATTEMPTS parameter on the root task. When a child task fails, the entire task graph is immediately retried, up to the number of times specified. If the task graph still doesn’t complete, the task graph is considered to have failed.
Suspend task graphs after failed task graph runs¶
By default, a task graph is suspended after 10 consecutive failures. You can change this value by setting SUSPEND_TASK_AFTER_NUM_FAILURES on the root task.
In the following example, whenever a child task fails, the task graph immediately retries twice before the entire task graph is considered failed. If the task graph fails three times in a row, the task graph is then suspended.
Unlink parent and child tasks¶
Dependencies between tasks in a task graph can be severed as a result of the following actions:
ALTER TASK … REMOVE AFTER and ALTER TASK … UNSET FINALIZE remove the link between the target task and the specified parent tasks or finalized root task.
DROP TASK and GRANT OWNERSHIP sever all the target task’s links. For example, root task A has child task B, and task B has child task C. If you drop task B, the link between task A and B is severed and so is the link between task B and C.
If any combination of the above actions severs the relationship between the child task and all parent tasks, the child task becomes either a standalone task or a root task.
Note
If you grant the ownership of a task to its current owner, dependency links might not be severed.
Overlap task graph runs¶
By default, Snowflake ensures that only one instance of a particular task graph is allowed to run at a time. The next run of a root task is scheduled only after all tasks in the task graph have finished running. This means that if the cumulative time required to run all tasks in the task graph exceeds the explicit scheduled time set in the definition of the root task, at least one run of the task graph is skipped.
To allow child tasks to overlap, use CREATE TASK or ALTER TASK on the root task, and set ALLOW_OVERLAPPING_EXECUTION to TRUE. (Root tasks never overlap.)
Overlapping runs may be tolerated (or even desirable) when read/write SQL operations executed by overlapping runs of a task graph do not produce incorrect or duplicate data. However, for other task graphs, task owners (the role with the OWNERSHIP privilege on all tasks in the task graph) should set an appropriate schedule on the root task and choose an appropriate warehouse size (or use serverless compute resources) to ensure an instance of the task graph finishes to completion before the root task is next scheduled to run.
To better align a task graph with the schedule defined in the root task:
If feasible, increase the scheduling time between runs of the root task.
Consider modifying compute-heavy tasks to use serverless compute resources. If the task relies on user-managed compute resources, increase the size of the warehouse that runs large or complex SQL statements or stored procedures in the task graph.
Analyze the SQL statements or stored procedure executed by each task. Determine if code can be rewritten to leverage parallel processing.
If none of the above solutions help, consider whether it is necessary to allow concurrent runs of the task graph by setting ALLOW_OVERLAPPING_EXECUTION = TRUE on the root task. This parameter can be defined when creating a task (using CREATE TASK) or later (using ALTER TASK or in Snowsight).
Versioning¶
When the root task in a task graph is resumed or manually executed, Snowflake sets a version of the entire task graph, including all properties for all tasks in the task graph. After a task is suspended and modified, Snowflake set a new version when the root task is resumed or manually executed.
To modify or recreate any task in a task graph, the root task must first be suspended. When the root task is suspended, all future scheduled runs of the root task are cancelled; however, if any tasks are currently running, these tasks and any descendant tasks continue to run using the current version.
Note
If the definition of a stored procedure called by a task changes while the task graph is executing, the new programming could be executed when the stored procedure is called by the task in the current run.
For example, suppose the root task in a task graph is suspended, but a scheduled run of this task has already started. The owner of all tasks in the task graph modifies the SQL code called by a child task while the root task is still running. The child task runs and executes the SQL code in its definition using the version of the task graph that was current when the root task started its run. When the root task is resumed or is manually executed, a new version of the task graph is set. This new version includes the modifications to the child task.
To retrieve the history of task versions, query TASK_VERSIONS Account Usage view (in the SNOWFLAKE shared database).
Task graph duration¶
Task graph duration includes the time from when the root task is scheduled to start to when the last child task completes. To calculate the duration of a task graph, query COMPLETE_TASK_GRAPHS view, and compare SCHEDULED_TIME with COMPLETED_TIME.
For example, the following diagram shows a task graph that is scheduled to run every minute. The root task and its two child tasks each queue for 5 seconds and run for 10 seconds, requiring a total of 45 seconds to complete.
Task graph timeouts¶
When USER_TASK_TIMEOUT_MS is set in the root task, the timeout applies to the entire task graph.
When USER_TASK_TIMEOUT_MS in set in a child task or finalizer task, the timeout applies to only that task.
When USER_TASK_TIMEOUT_MS is set in both the root task and a child task, the child task timeout overrides the root task timeout for that child task.
Considerations¶
For serverless tasks, Snowflake automatically scales resources to make sure tasks complete within a target completion interval, including queueing time.
For user-managed tasks, longer queueing periods are common when tasks are scheduled to run on a shared or busy warehouse.
For task graphs, the total time might include additional queueing time for child tasks waiting for their predecessors complete.
Create a task graph with logic (runtime info, configuration, and return values)¶
Tasks in a task graph can use return values from parent tasks to perform logic-based operations in their function body.
Considerations:
Some logic-based commands, like SYSTEM$GET_PREDECESSOR_RETURN_VALUE, are case sensitive. However, tasks created using CREATE TASK without quotes are stored and resolved in uppercase. To manage this, you can do any of the following:
Create task names using only uppercase letters.
Use quotes when naming and calling tasks.
For task names defined with lowercase characters, call the task using uppercase characters. For example: a task defined by “CREATE TASK task_c…” can be called as SELECT SYSTEM$GET_PREDECESSOR_RETURN_VALUE(‘TASK_C’).
Pass configuration information to the task graph¶
You can pass configuration information by using a JSON object that can be read by other tasks in a task graph. Use the syntax CREATE TASK or ALTER TASK with the CONFIG parameter to set, unset, or modify the configuration information in the root task. Use the function SYSTEM$GET_TASK_GRAPH_CONFIG to retrieve the configuration information.
Example:
Note
You can dynamically override the configuration for a single task execution with the EXECUTE TASK … USING CONFIG command. With this command, you can test different configurations or run ad-hoc executions with modified settings without changing the task definition.
Pass return values between tasks¶
You can pass return values between tasks in a task graph. Use the function SYSTEM$SET_RETURN_VALUE to add a return value from a task, and use the function SYSTEM$GET_PREDECESSOR_RETURN_VALUE to retrieve it.
When a task has multiple predecessors, you must specify which task has the return value that you want. In the following example, we create a root task in a task graph that adds configuration information.
Get and use runtime information¶
Use the function SYSTEM$TASK_RUNTIME_INFO to report information about the current task run. This function has several options specific to task graphs. For example, use CURRENT_ROOT_TASK_NAME to get the name of the root task in the current task graph. The following examples shows how to add a date stamp to a table based on when the root task of the task graph started.
Examples¶
Example: Start multiple tasks and report status¶
In the following example, the root task starts tasks to update three different tables. After those three tables are updated, a task combines the information from the other three tables into an aggregate sales table.
Finalizer task example: Send email notification¶
This example demonstrates how to use a finalizer task to send an email summary of a task graph run. The finalizer task calls two external functions: one aggregates the completion status of each task, and the other formats the information into an email for a remote messaging service.
This example uses an example root task named task_root and an example finalizer task named notify_finalizer.
Finalizer task example: Correct for errors¶
This example demonstrates how a finalizer task can correct for errors.
For demonstration purposes, the tasks are designed to fail during their first run. The finalizer tasks corrects the issue and restarts the tasks, which succeed on following runs: