Introduction to Tasks

A task can execute any one of the following types of SQL code:

Tasks can be combined with table streams for continuous ELT workflows to process recently changed table rows. Streams ensure exactly once semantics for new or changed data in a table.

Tasks can also be used independently to generate periodic reports by inserting or merging rows into a report table or perform other periodic work.

In this Topic:

Compute Resources

Tasks require compute resources to execute SQL code. Either of the following compute models can be chosen for individual tasks:

  • Snowflake-managed (i.e. serverless compute model)

  • User-managed (i.e. virtual warehouse)

Serverless Tasks

The serverless compute model for tasks enables you to rely on compute resources managed by Snowflake instead of user-managed virtual warehouses. The compute resources are automatically resized and scaled up or down by Snowflake as required for each workload. Snowflake determines the ideal size of the compute resources for a given run based on a dynamic analysis of statistics for the most recent previous runs of the same task. Multiple workloads in your account share a common set of compute resources.

The option to enable the serverless compute model must be specified when creating a task. The CREATE TASK syntax is nearly identical to tasks that rely on user-managed virtual warehouses. Omit the WAREHOUSE parameter to allow Snowflake to manage the compute resources for the task. Note that the role that executes the CREATE TASK command must have the global EXECUTE MANAGED TASK privilege. For more information about the access control requirements for tasks, see Task Security.

Billing for runs of serverless tasks differs somewhat from the standard credit consumption model for tasks that rely on warehouses for compute resources. For information, see Billing for Task Runs.

Note

Serverless tasks cannot invoke the following object types and functions:

  • UDFs (user-defined functions) that contain Java or Python code.

  • Stored procedures written in Scala (using Snowpark), or which call UDFs that contain Java or Python code.

  • The EXTRACT_SEMANTIC_CATEGORIES function.

User-managed Tasks

If you prefer, you can alternatively manage the compute resources for individual tasks by specifying an existing virtual warehouse when creating the task. This option requires that you choose a warehouse that is sized appropriately for the SQL actions that are executed by the task.

Choosing a Warehouse Size

If you choose to use existing warehouses to supply the compute resources for individual tasks, we recommend that you follow the best practices described in Warehouse Considerations. We suggest that you analyze the average run time for a single task or DAG of tasks using a specific warehouse based on warehouse size and clustering, as well as whether or not the warehouse is shared by multiple processes or is dedicated to running this single task (or DAG).

Query the TASK_HISTORY Account Usage view (in the SNOWFLAKE shared database). The average difference between the scheduled and completed times for a task is the expected average run time for the task, including any period in which the task was queued. A task is queued when other processes are currently using all of the compute resources in the warehouse.

Unless the SQL statements defined for the tasks can be optimized (either by rewriting the statements or using stored procedures), then this would be the expected average run time for the task (or DAG). Choose the right size for the warehouse based on your analysis to ensure the task (or DAG) finishes running within this window.

The following diagram shows a window of 1 minute in which a single task queued for 20 seconds and then ran for 40 seconds.

Example task batch window

The following diagram shows a DAG that requires 5 minutes on average to complete for each run. The diagram shows the window for 2 runs of the DAG to complete. This window is calculated from the time the root task is scheduled to start until the last child task in the DAG has completed running. In this example, the DAG is shared with other, concurrent operations that queue while each of the 3 tasks in the DAG is running. These concurrent operations consume all available resources when each task in the DAG finishes running and before the next task starts running. As a result, the window for each task includes some amount of queuing while it waits for other operations to finish and relinquish compute resources.

Note that even if this DAG ran on a dedicated warehouse, a brief lag would be expected after a predecessor task finishes running and any child task is executed; however, no queueing for shared resources with other operations would occur. The warehouse size you choose should be large enough to accommodate multiple child tasks that are triggered simultaneously by predecessor tasks.

Example DAG batch window

Recommendations for Choosing a Compute Model

The following table describes various factors that can help you decide when to use serverless tasks versus user-managed tasks:

Category

Serverless Tasks

User-managed Tasks

Notes

Number, duration, and predictability of concurrent task workloads

Recommended when you cannot fully utilize a warehouse because too few tasks run concurrently or they run to completion quickly (in less than 1 minute).

Because the size of compute resources chosen is based on the history of previous runs, tasks with relatively stable runs are good candidates for serverless tasks.

Recommended when you can fully utilize a single warehouse by scheduling multiple concurrent tasks to take advantage of available compute resources.

Also recommended for spiky or unpredictable loads on compute resources. Multi-cluster warehouses with auto-suspend and auto-resume enabled could help moderate your credit consumption.

For serverless tasks, Snowflake bills your account based on the actual compute resource usage.

In contrast, billing for user-managed warehouses is based on warehouse size, with a 60-second minimum each time the warehouse is resumed, regardless of the compute resources used.

Schedule interval

Recommended when adherence to the schedule interval is highly important.

If a run of a standalone task or scheduled DAG exceeds nearly all of this interval, Snowflake increases the size of the compute resources (to a maximum of the equivalent of a 2X-Large warehouse).

Recommended when adherence to the schedule interval is less important.

Schedule interval refers to the interval of time between successive scheduled executions of a standalone task or the root task in a DAG.

Note that increasing the compute resources reduces the execution time of some, but not all, SQL code and might not be sufficient to ensure a task run is completed within the batch window.

Task Scheduling

A standalone task or the root task in a DAG generally runs on a schedule. You can define the schedule when creating a task (using CREATE TASK) or later (using ALTER TASK).

Snowflake ensures only one instance of a task with a schedule (i.e. a standalone task or the root task in a DAG) is executed at a given time. If a task is still running when the next scheduled execution time occurs, then that scheduled time is skipped.

Snowflake automatically resizes and scales the compute resources for serverless tasks. For tasks that rely on a warehouse to provide compute resources, choose an appropriate warehouse size for a given task to complete its workload within the defined schedule. For information, see Choosing a Warehouse Size (in this topic).

Task Scheduling and Daylight Saving Time

The cron expression in a task definition supports specifying a time zone. A scheduled task runs according to the specified cron expression in the local time for a given time zone. Special care should be taken with regard to scheduling tasks for time zones that recognize daylight saving time. Tasks scheduled during specific times on days when the transition from standard time to daylight saving time (or the reverse) occurs can have unexpected behaviors.

For example:

  • During the autumn change from daylight saving time to standard time, a task scheduled to start at 1 AM in the America/Los_Angeles time zone (i.e. 0 1 * * * America/Los_Angeles) would run twice: once at 1 AM and then again when 1:59:59 AM shifts to 1:00:00 AM local time. That is, there are two points in time when the local time is 1 AM.

  • During the spring change from standard time to daylight saving time, a task scheduled to start at 2 AM in the America/Los_Angeles time zone (i.e. 0 2 * * * America/Los_Angeles) would not run at all because the local time shifts from 1:59:59 AM to 3:00:00 AM. That is, there is no point during that day when the local time is 2 AM.

To avoid unexpected task executions due to daylight saving time, either:

  • Do not schedule tasks to run at a specific time between 1 AM and 3 AM (daily, or on days of the week that include Sundays), or

  • Manually adjust the cron expression for tasks scheduled during those hours twice each year to compensate for the time change due to daylight saving time.

DAG of Tasks

A Directed Acyclic Graph (DAG) is a series of tasks composed of a single root task and additional tasks, organized by their dependencies. DAGs flow in a single direction, meaning a task later in the series cannot prompt the run of an earlier task (i.e. a loop). Each task (except the root task) can have multiple predecessor tasks (dependencies); likewise, each task can have multiple subsequent (child) tasks that depend on it. A task runs only after all of its predecessor tasks have run successfully to completion.

The root task should have a defined schedule that initiates a run of the DAG. Each of the other tasks has at least one defined predecessor to link the tasks in the DAG. A child task runs only after all of its predecessor tasks run successfully to completion.

You can specify the predecessor tasks when creating a new task (using CREATE TASK … AFTER) or later (using ALTER TASK … ADD AFTER). A DAG is limited to a maximum of 1000 tasks total (including the root task). A single task can have a maximum of 100 predecessor tasks and 100 child tasks.

In the following basic example, the root task prompts Tasks B and C to run simultaneously. Task D runs when both Tasks B and C have completed their runs.

Basic DAG example

The following practical example shows how a DAG could be used to update dimension tables in a sales database before aggregating fact data:

Sales database DAG example

A further example shows the concluding task in a DAG calling an external function to trigger a remote messaging service to send a notification that all previous tasks have run successfully to completion.

Cloud messaging notification DAG example

Task Ownership

All tasks in a DAG must have the same task owner (i.e. a single role must have the OWNERSHIP privilege on all of the tasks) and be stored in the same database and schema.

Transferring ownership of a task severs the dependency between this task and any predecessor and child tasks. For more information, see Link Severed Between Predecessor and Child Tasks (in this topic).

When ownership of all tasks in a DAG is transferred at once, through either of the following activities, the relationships between all tasks in the DAG are retained:

  • The current owner of all tasks that comprise the DAG is dropped (using DROP ROLE). Ownership of the objects owned by the dropped role is transferred to the role that executes the DROP ROLE command.

  • Ownership of all tasks that comprise the DAG is explicitly transferred to another role (e.g. by executing GRANT OWNERSHIP on all tasks in a schema).

Overlapping DAG Runs

By default, Snowflake ensures that only one instance of a particular DAG is allowed to run at a time. The next run of a root task is scheduled only after all tasks in the DAG have finished running. This means that if the cumulative time required to run all tasks in the DAG exceeds the explicit scheduled time set in the definition of the root task, at least one run of the DAG is skipped. The behavior is controlled by the ALLOW_OVERLAPPING_EXECUTION parameter on the root task; the default value is FALSE. Setting the parameter value to TRUE permits DAG runs to overlap.

In addition, a child task begins its run only after all predecessor tasks for the child task have successfully completed their own runs. A task that executes time-intensive SQL operations delays the start of any child task that identifies the task as a predecessor.

In the following example, a DAG run is scheduled to start when a prior run has not completed yet. The period of overlap, or concurrency, is identified in red. The diagram also identifies the span of time when each task is queued before running in the user-managed warehouse. Note that if Snowflake-managed compute resources are used, there is no queuing period:

Overlapping DAG runs

Overlapping runs may be tolerated (or even desirable) when read/write SQL operations executed by overlapping runs of a DAG do not produce incorrect or duplicate data. However, for other DAGs, task owners (i.e. the role with the OWNERSHIP privilege on all tasks in the DAG) should set an appropriate schedule on the root task and choose an appropriate warehouse size (or use Snowflake-managed compute resources) to ensure an instance of the DAG finishes to completion before the root task is next scheduled to run.

To better align a DAG with the schedule defined in the root task:

  1. If feasible, increase the scheduling time between runs of the root task.

  2. Consider modifying compute-heavy tasks to use Snowflake-managed compute resources. If the task relies on user-managed compute resources, increase the size of the warehouse that runs large or complex SQL statements or stored procedures in the DAG.

  3. Analyze the SQL statements or stored procedure executed by each task. Determine if code could be rewritten to leverage parallel processing.

If none of the above solutions help, consider whether it is necessary to allow concurrent runs of the DAG by setting ALLOW_OVERLAPPING_EXECUTION = TRUE on the root task. This parameter can be defined when creating a task (using CREATE TASK) or later (using ALTER TASK).

Viewing Dependent Tasks in a DAG

To view either the direct child tasks for a root task or all tasks in a DAG:

SQL

Query the TASK_DEPENDENTS table function (in the Snowflake Information Schema). Note that to retrieve all tasks in a DAG, input the root task when calling the function. If you input a child task, the function returns the children (and the children of those children, etc.) of the specified task.

DAG Runs with Suspended Child Tasks

When the root task is suspended, you can resume or suspend any child tasks using ALTER TASK … RESUME | SUSPEND. Resuming any suspended child tasks is not required before you resume the root task.

When a DAG runs with one or more suspended child tasks, the run ignores those tasks. A child task with multiple predecessors runs as long as at least one of the predecessors is in a resumed state, and all resumed predecessors run successfully to completion.

Resuming Suspended Tasks in a DAG

To recursively resume all tasks in a DAG, query the SYSTEM$TASK_DEPENDENTS_ENABLE function rather than resuming each task individually (using ALTER TASK … RESUME).

Versioning of Runs

When a standalone task or the root task in a DAG is first resumed (or manually executed using EXECUTE TASK), an initial version of the task is set. If the task is a root task, then a version of the entire DAG, including all properties for all tasks in the DAG, is set. The standalone task or DAG runs using this version. After a task is suspended and modified, a new version is set when the standalone or root task is resumed or manually executed.

To modify or recreate any task in a DAG, the root task must first be suspended (using ALTER TASK … SUSPEND). When the root task is suspended, all future scheduled runs of the root task are cancelled; however, if any tasks are currently running (i.e, the tasks in an EXECUTING state), these tasks and any descendent tasks continue to run using the current version.

Note

If the definition of a stored procedure called by a task changes while the DAG is executing, the new programming could be executed when the stored procedure is called by the task in the current run.

For example, suppose the root task in a DAG is suspended, but a scheduled run of this task has already started. The owner of all tasks in the DAG modifies the SQL code called by a child task while the root task is still running. The child task runs and executes the SQL code in its definition using the version of the DAG that was current when the root task started its run. When the root task is resumed or is manually executed, a new version of the DAG is set. This new version includes the modifications to the child task.

Setting Session Parameters for Tasks

You can set session parameters for the session in which a task runs. To do so, modify an existing task and set the desired parameter values (using ALTER TASKSET session_parameter = value[, session_parameter = value ... ]).

A task supports all session parameters. For the complete list, see Parameters.

Note that a task does not support account or user parameters.

Automatically Suspend Tasks After Failed Runs

Optionally suspend tasks automatically after a specified number of consecutive runs that either fail or time out. This feature can reduce costs by suspending tasks that consume Snowflake credits but fail to run to completion. Failed task runs include runs in which the SQL code in the task body either produces a user error or times out. Task runs that are skipped, canceled, or that fail due to a system error are considered indeterminate and are not included in the count of failed task runs.

Set the SUSPEND_TASK_AFTER_FAILURES = num parameter on a standalone task or the root task in a DAG. When the parameter is set to a value greater than 0, the following behavior applies to runs of the standalone task or DAG:

  • Standalone tasks are automatically suspended after the specified number of consecutive task runs either fail or time out.

  • The root task is automatically suspended after the run of any single task in a DAG fails or times out the specified number of times in consecutive runs.

The parameter can be set when creating a task (using CREATE TASK) or later (using ALTER TASK). The setting applies to tasks that rely on either Snowflake-managed compute resources (i.e. serverless compute model) or user-managed compute resources (i.e. a virtual warehouse).

The SUSPEND_TASK_AFTER_NUM_FAILURES parameter can also be set at the account, database, or schema level. The setting applies to all standalone or root tasks contained in the modified object. Note that explicitly setting the parameter at a lower (i.e. more granular) level overrides the parameter value set at a higher level.

Manually Executing Tasks

The EXECUTE TASK command manually triggers a single run of a scheduled task (either a standalone task or the root task in a DAG) independent of the schedule defined for the task. A successful run of a root task triggers a cascading run of child tasks in the DAG as their precedent task completes, as though the root task had run on its defined schedule.

This SQL command is useful for testing new or modified standalone tasks and DAGs before you enable them to execute SQL code in production.

Call this SQL command directly in scripts or in stored procedures. In addition, this command supports integrating tasks in external data pipelines. Any third-party services that can authenticate into your Snowflake account and authorize SQL actions can execute the EXECUTE TASK command to run tasks.

Note

Supported as a preview feature.

Viewing the Task History for Your Account

The following roles (or roles with the specified privileges) can use SQL to view the task history within a specified date range:

  • Account administrator (i.e. users with the ACCOUNTADMIN role).

  • Task owner (i.e. role that has the OWNERSHIP privilege on a task).

  • Any role that has the global MONITOR EXECUTION privilege.

To view the run history for a single task:

SQL

Query the TASK_HISTORY table function (in the Snowflake Information Schema).

To view details on a DAG run that is currently scheduled or is executing:

SQL

Query the CURRENT_TASK_GRAPHS table function (in the Snowflake Information Schema).

To view the history for DAG runs that executed successfully, failed, or were cancelled in the past 60 minutes:

SQL

Query the COMPLETE_TASK_GRAPHS table function (in the Snowflake Information Schema).

Query the COMPLETE_TASK_GRAPHS View view (in Account Usage).

Billing for Task Runs

The costs associated with running a task to execute SQL code differ depending on the source of the compute resources for the task:

User-managed warehouse

Snowflake bills your account for credit usage based on warehouse usage while a task is running, similar to the warehouse usage for executing the same SQL statements in a client or the Snowflake web interface. Per-second credit billing and warehouse auto-suspend give you the flexibility to start with larger warehouse sizes and then adjust the size to match your task workloads.

Snowflake-managed resources (i.e. serverless compute model)

Snowflake bills your account based on the actual compute resource usage; in contrast with customer-managed virtual warehouses, which consume credits when active, and may sit idle or be overutilized. Charges are calculated based on total usage of the resources (including cloud service usage) measured in compute-hours credit usage.

Snowflake analyzes task runs in the task history dynamically to determine the ideal size of the compute resources, and suspends these compute resources to save costs. Snowflake manages load capacity, ensuring optimal compute resources to meet demand. To recover the management costs of Snowflake-provided compute resources, we apply a 1.5x multiplier to resource consumption. Note, however, that the serverless compute model could still reduce compute costs over user-managed warehouses; in some cases significantly.

Snowflake credits charged per compute-hour:

  • Snowflake-managed compute resources: 1.5

  • Cloud services: 1

Billing is similar to other Snowflake features such as Automatic Clustering of tables, Database Replication and Failover/Failback, and Snowpipe.

To retrieve the current credit usage for a specific task, query the SERVERLESS_TASK_HISTORY table function. Execute the following statement as the task owner (i.e. the role that has the OWNERSHIP privilege on the task):

SET num_credits = (SELECT SUM(credits_used)
  FROM TABLE(<database_name>.information_schema.serverless_task_history(
    date_range_start=>dateadd(D, -1, current_timestamp()),
    date_range_end=>dateadd(D, 1, current_timestamp()),
    task_name => '<task_name>')
    )
  );

Where:

<database_name>

Name of the database that contains the task.

<task_name>

Name of the task.

To retrieve the current credit usage for all serverless tasks, query the SERVERLESS_TASK_HISTORY view. Execute the following statement as an account administrator (i.e. a user with the ACCOUNTADMIN role):

SELECT START_TIME, END_TIME, TASK_ID, TASK_NAME, CREDITS_USED, SCHEMA_ID, SCHEMA_NAME, DATABASE_ID, DATABASE_NAME
  FROM snowflake.account_usage.serverless_task_history
  ORDER BY start_time, task_id;

Understanding the System Service

Snowflake runs tasks with the privileges of the task owner (i.e. the role that has OWNERSHIP privilege on the task), but task runs are not associated with a user. Instead, each run is executed by a system service. Tasks are decoupled from specific users to avoid complications that can arise when users are dropped, locked due to authentication issues, or have roles removed.

Because task runs are decoupled from a user, the query history for task runs are associated with the system service. SYSTEM is not a user in the account; it is a behind-the-scenes service. As such, there are no user credentials for this service, and no individual (from Snowflake or in your account) can assume its identity. Activity for the system service is limited to your account. The same encryption protections and other security protocols are built into this service as are enforced for other operations.

Task DDL

To support creating and managing tasks, Snowflake provides the following set of special DDL commands:

In addition, providers can view, grant, or revoke access to the necessary database objects for ELT using the following standard access control DDL:

Task Functions

To support retrieving information about tasks, Snowflake provides the following set of SQL functions:

Task Security

Access Control Privileges

Creating Tasks

Creating tasks requires a role with a minimum of the following privileges:

Object

Privilege

Notes

Account

EXECUTE MANAGED TASK

Required only for tasks that rely on Snowflake-managed compute resources (serverless compute model).

Database

USAGE

Schema

USAGE, CREATE TASK

Warehouse

USAGE

Required only for tasks that rely on user-managed warehouses for compute resources.

Owning Tasks

After a task is created, the task owner (i.e. the role that has the OWNERSHIP privilege on the task) must have the following privileges:

Object

Privilege

Notes

Account

EXECUTE TASK

Required to run any tasks the role owns. Revoking the EXECUTE TASK privilege on a role prevents all subsequent task runs from starting under that role.

Account

EXECUTE MANAGED TASK

Required only for tasks that rely on Snowflake-managed compute resources.

Database

USAGE

Schema

USAGE

Task

OWNERSHIP

Warehouse

USAGE

In addition, the role must have the permissions required to run the SQL statement executed by the task.

Resuming or Suspending Tasks

In addition to the task owner, a role that has the OPERATE privilege on the task can suspend or resume the task. This role must have the USAGE privilege on the database and schema that contain the task. No other privileges are required.

When a task is resumed, Snowflake verifies that the task owner role has the privileges listed in Owning Tasks (in this topic).

Creating a Task Administrator Role

For ease of use, we recommend creating a custom role (e.g. taskadmin) and assigning the EXECUTE TASK privilege to this role. Any role that can grant privileges (e.g. SECURITYADMIN or any role with the MANAGE GRANTS privilege) can then grant this custom role to any task owner role to allow altering their own tasks. To remove the ability for the task owner role to execute the task, it is only necessary to revoke this custom role from the task owner role. Note that if you choose not to create this custom role, an account administrator must revoke the EXECUTE TASK privilege from the task owner role.

For example, create a custom role name taskadmin and grant that role the EXECUTE TASK privilege. Assign the taskadmin role to a task owner role named myrole:

USE ROLE securityadmin;

CREATE ROLE taskadmin;

-- set the active role to ACCOUNTADMIN before granting the account-level privileges to the new role
USE ROLE accountadmin;

GRANT EXECUTE TASK, EXECUTE MANAGED TASK ON ACCOUNT TO ROLE taskadmin;

-- set the active role to SECURITYADMIN to show that this role can grant a role to another role
USE ROLE securityadmin;

GRANT ROLE taskadmin TO ROLE myrole;

For more information on creating custom roles and role hierarchies, see Configuring Access Control.

Dropping a Task Owner Role

When the owner role of a given task (i.e. the role with the OWNERSHIP privilege on the task) is deleted, the task is “re-possessed” by the role that dropped the owner role. This ensures that ownership moves to a role that is closer to the root of the role hierarchy. When a task is re-possessed, it is automatically paused, i.e., all executions currently in flight complete processing, but new executions will not be scheduled until the task is resumed explicitly by the new owner. The rationale for this is to prevent a user with access to a particular role from leaving behind tasks that suddenly execute with higher permissions when the role is removed.

If the role that a running task is executing under is dropped while the task is running, the task completes processing under the dropped role.

Workflow

This section provides a high-level overview of the task setup workflow.

  1. Complete the steps in Creating a Task Administrator Role (in this topic) to create a role that can be used to execute the commands in the following steps.

  2. Create a task using CREATE TASK. The task is suspended by default.

    Note

    Verify the SQL statement that you will reference in a task executes as expected before you create the task. Tasks are intended to automate SQL statements or stored procedures that have already been tested thoroughly.

  3. Execute ALTER TASK … RESUME to allow the task to run based on the parameters specified in the task definition.

Back to top