Manage task dependencies with task graphs

With task graphs you can automatically run sequences of tasks. A task graph, or directed acyclic graph (DAG), is a series of tasks composed of a root task and child tasks, organized by their dependencies. Task graphs flow in a single direction, meaning a task later in the series cannot prompt the run of an earlier task. Each task can depend on multiple other tasks and won’t run until they all complete. Each task can also have multiple child tasks that depend on it.

Tasks in a task graph can also use the return values of parent tasks to perform logic based operations in their SQL function body.

Create a task graph

To create a task graph, specify parent tasks when you create or change a task. The root task of your graph is the task with no parent tasks. The root task should have a defined schedule that initiates a run of the task graph. Each child task must have at least one defined parent task to link the tasks in the task graph.

Use the CREATE TASK … AFTER or ALTER TASK … ADD AFTER commands to add child tasks. You can also manage your Snowflake tasks and task graphs with Python. For more information, see Managing Snowflake tasks and task graphs with Python.

Task graph considerations
  • A task graph is limited to a maximum of 1000 tasks.

  • A single task can have a maximum of 100 parent tasks and 100 child tasks.

  • The compute running the task graph must be sized to handle concurrent task runs. For more information see, Compute resources.

In the following example, the root task prompts Tasks B and C to run simultaneously. Task D runs when both Tasks B and C have completed their runs.

A diamond shaped task graph with task A on the left, labeled root task, pointing to both task B and C in the middle. Tasks B and C point to task D on the right.

The following example shows how you can use a task graph to update dimension tables in a sales database before aggregating fact data:

A diamond shaped task graph with a task in the center. The root task on the left points to the Update Customer Table, Update Product Table, and Update Data and Time Table tasks. Those three tasks then point to the Aggregate Sales Table task on the right.

This example shows the concluding task in a task graph calling an external function to prompt a remote messaging service to send a notification that all previous tasks have run successfully to completion.

A task graph with the intermediate tasks skipped using an ellipsis. The root task on the left points to the ellipsis which points to the final Send Notification via External Function task on the right.

Manage task graph ownership

All tasks in a task graph must have the same task owner and be stored in the same database and schema.

You can transfer ownership of all tasks in a task graph using one of the following actions:

  • Drop the owner of all tasks in the task graph using DROP ROLE. Snowflake transfers ownership to the role that runs the DROP ROLE command.

  • Transfer ownership of all tasks in the task graph using GRANT OWNERSHIP on all tasks in a schema.

When you transfer ownership of the tasks in a task graph using these methods, the tasks in the task graph retain their relationships to each other.

Transferring ownership of a single task removes the dependency between the task and any parent and child tasks. For more information, see Unlink parent and child tasks (in this topic).

Running tasks in a task graphs

A task graph run is kicked off by a run of it’s defined root task. A successful run of a root task triggers a cascading run of child tasks in the task graph as their precedent task completes. Root tasks can be run in the following ways:

  • Task scheduling - Generally task graphs are run on a CRON or interval based schedule.

  • ALTER TASK - You can use ALTER TASK [ IF EXISTS ] <name> RESUME to run a task graph based on it’s existing schedule. All tasks must be resumed when first created.

  • EXECUTE TASK - You can use EXECUTE TASK <name> to create a one-time run of a task graph.

Manually running tasks in a task graph

The EXECUTE TASK command manually triggers a single run of a task independent of the schedule defined for the task. A successful run of a root task triggers a cascading run of child tasks in the task graph as their precedent task completes, as though the root task had run on its defined schedule.

You can also use EXECUTE TASK <name> RETRY LAST to rerun any child task in a task graph. RETRY LAST attempts to run the task graph from the last failed task. If the task succeeds all child tasks will continue to run as their precedent tasks complete.

This SQL command is useful for testing new or modified task graphs before you enable them to execute SQL code in production.

Overlapping task graph runs

By default, Snowflake ensures that only one instance of a particular task graph is allowed to run at a time. The next run of a root task is scheduled only after all tasks in the task graph have finished running. This means that if the cumulative time required to run all tasks in the task graph exceeds the explicit scheduled time set in the definition of the root task, at least one run of the task graph is skipped. The behavior is controlled by the ALLOW_OVERLAPPING_EXECUTION parameter on the root task; the default value is FALSE. Setting the parameter value to TRUE permits task graph runs to overlap.

In addition, a child task begins its run only after all predecessor tasks for the child task have successfully completed their own runs. A task that executes time-intensive SQL operations delays the start of any child task that identifies the task as a predecessor.

In the following example, a task graph run is scheduled to start when a prior run has not completed yet. The period of overlap, or concurrency, is identified in red. The diagram also identifies the span of time when each task is queued before running in the user-managed warehouse. Note that if you use serverless compute resources, there is no queuing period:

Overlapping task graph runs

Overlapping runs may be tolerated (or even desirable) when read/write SQL operations executed by overlapping runs of a task graph do not produce incorrect or duplicate data. However, for other task graphs, task owners (the role with the OWNERSHIP privilege on all tasks in the task graph) should set an appropriate schedule on the root task and choose an appropriate warehouse size (or use serverless compute resources) to ensure an instance of the task graph finishes to completion before the root task is next scheduled to run.

To better align a task graph with the schedule defined in the root task:

  1. If feasible, increase the scheduling time between runs of the root task.

  2. Consider modifying compute-heavy tasks to use serverless compute resources. If the task relies on user-managed compute resources, increase the size of the warehouse that runs large or complex SQL statements or stored procedures in the task graph.

  3. Analyze the SQL statements or stored procedure executed by each task. Determine if code could be rewritten to leverage parallel processing.

If none of the above solutions help, consider whether it is necessary to allow concurrent runs of the task graph by setting ALLOW_OVERLAPPING_EXECUTION = TRUE on the root task. This parameter can be defined when creating a task (using CREATE TASK) or later (using ALTER TASK or in Snowsight).

Suspending and resuming tasks in a task graph

To suspend or resume a task in a task graph use ALTER TASK … RESUME | SUSPEND or the Snowsight.

Suspending a root task

When the root task is suspended, all future scheduled runs of the root task are cancelled; however, if any tasks are currently running, these tasks and any descendant tasks continue to run.

Resuming and suspending child tasks

To resume or suspend a child task you must suspend the root task. Resuming suspended child tasks is not required to resume the root task.

Recursively resume tasks

To recursively resume all tasks in a task graph, query the SYSTEM$TASK_DEPENDENTS_ENABLE function.

Task graph runs with suspended child tasks

When a task graph runs with one or more suspended child tasks, the run ignores those tasks. A child task with multiple predecessors runs as long as at least one of the predecessors is in a resumed state, and all resumed predecessors run successfully to completion.

Task graph versioning

When the root task in a task graph is resumed or manually executed, Snowflake sets a version of the entire task graph, including all properties for all tasks in the task graph. After a task is suspended and modified, Snowflake set a new version when the root task is resumed or manually executed.

To modify or recreate any task in a task graph, the root task must first be suspended. When the root task is suspended, all future scheduled runs of the root task are cancelled; however, if any tasks are currently running, these tasks and any descendant tasks continue to run using the current version.

Note

If the definition of a stored procedure called by a task changes while the task graph is executing, the new programming could be executed when the stored procedure is called by the task in the current run.

For example, suppose the root task in a task graph is suspended, but a scheduled run of this task has already started. The owner of all tasks in the task graph modifies the SQL code called by a child task while the root task is still running. The child task runs and executes the SQL code in its definition using the version of the task graph that was current when the root task started its run. When the root task is resumed or is manually executed, a new version of the task graph is set. This new version includes the modifications to the child task.

To retrieve the history of task versions, query TASK_VERSIONS Account Usage view (in the SNOWFLAKE shared database).

Automatically suspend task graphs after failed task runs

Optionally suspend task graphs automatically after a specified number of consecutive task runs that either fail or time out.

Set the SUSPEND_TASK_AFTER_NUM_FAILURES = num parameter on the root task in a task graph. When the parameter is set to a value greater than 0, the root task is automatically suspended after any child task in the task graph consecutively fails or times out the specified number of times. The child task that fails or times out is not suspended.

Automatically retry failed task graph runs

Specifies the number of automatic task graph retry attempts. If any task graphs complete in a FAILED state, Snowflake can automatically retry the task graphs from the last task in the graph that failed.

The automatic task graph retry is disabled by default. To enable this feature, set TASK_AUTO_RETRY_ATTEMPTS to a value greater than 0.

View dependent tasks in a task graph

To view the child tasks for a root task, query the TASK_DEPENDENTS table function. To retrieve all tasks in a task graph, input the root task when calling the function.

You can also use Snowsight to manage and view your task graphs. For more information, see Viewing tasks and task graphs in Snowsight.

Release and cleanup of task graphs

A finalizer task handles the release and cleanup of resources that a task graph uses. The finalizer task is guaranteed to run if the task graph is executed and ensures proper resource cleanup and completion of necessary steps in all scenarios. For example, if a task graph run uses intermediate tables to track data for processing and fails before the table rows are consumed, the next run will encounter duplicate rows and reprocess data resulting in longer execution time or wasting compute resources. The finalizer task can address this issue by dropping the rows or truncating the table as needed.

The finalizer task works like any other task in a task graph, with the following differences:

  • A finalizer task is always associated with a root task. Each root task can only have one finalizer task, and a finalizer task can only be associated with one root task.

  • A finalizer task is scheduled only when no other tasks are running or queued in the current task graph run, and at least one task in the graph has begun execution. If a graph is skipped (for example, the root task is skipped), the finalizer task will not run. If ALLOW_OVERLAPPING_EXECUTION is true, the finalizer task will behave like the other tasks and will still be scheduled even if there are other ongoing task graph runs.

  • A finalizer task cannot have any child tasks. Any command that tries to make the finalizer task a predecessor will fail. The creation of a finalizer task must include the FINALIZE keyword, which is incompatible with both the SCHEDULE and AFTER keywords.

To create a finalizer task, create a task using the FINALIZE keyword and set a relationship to the root task:

CREATE TASK <TASK_NAME> ...
... FINALIZE = <ROOT_TASK_NAME>
Copy

For more information, see CREATE TASK.