Introduction to tasks¶
A task can execute any one of the following types of SQL code:
Single SQL statement
Call to a stored procedure
Procedural logic using Snowflake Scripting
Tasks can be combined with table streams for continuous ELT workflows to process recently changed table rows. Streams ensure exactly once semantics for new or changed data in a table.
Tasks can also be used independently to generate periodic reports by inserting or merging rows into a report table or perform other periodic work.
Tasks require compute resources to execute SQL code. Either of the following compute models can be chosen for individual tasks:
Serverless compute model
User-managed virtual warehouse
The serverless compute model for tasks enables you to rely on compute resources managed by Snowflake instead of user-managed virtual warehouses. Serverless compute resources are automatically resized and scaled up or down by Snowflake as required for each workload. Snowflake determines the ideal size of serverless compute resources for a given run based on a dynamic analysis of statistics for the most recent runs of the same task. The maximum compute size for a serverless task is equivalent to an XXLARGE user-managed virtual warehouse. Multiple workloads in your account share a common set of compute resources.
To enable the serverless compute model, you must omit the WAREHOUSE parameter when you create a task using CREATE TASK. Note that the role that executes the CREATE TASK command must have the global EXECUTE MANAGED TASK privilege. For more information about the access control requirements for tasks, see Task security.
Billing for runs of serverless tasks is different from the standard credit consumption model for tasks that rely on virtual warehouses for compute resources. For more information, see Task costs.
Serverless tasks cannot invoke the following object types and functions:
UDFs (user-defined functions) that contain Java or Python code.
Stored procedures written in Scala (using Snowpark), or which call UDFs that contain Java or Python code.
Serverless tasks support the use of hybrid tables, but you might not experience optimal performance or efficiency while using hybrid tables.
If you prefer, you can alternatively manage the compute resources for individual tasks by specifying an existing virtual warehouse when creating the task. This option requires that you choose a warehouse that is sized appropriately for the SQL actions that are executed by the task.
Choosing a warehouse size¶
If you choose to use existing warehouses to supply the compute resources for individual tasks, we recommend that you follow the best practices described in Warehouse Considerations. We suggest that you analyze the average run time for a single task or task graph using a specific warehouse based on warehouse size and clustering, as well as whether or not the warehouse is shared by multiple processes or is dedicated to running this single task or task graph.
Query the TASK_HISTORY Account Usage view (in the SNOWFLAKE shared database). The average difference between the scheduled and completed times for a task is the expected average run time for the task, including any period in which the task was queued. A task is queued when other processes are currently using all of the compute resources in the warehouse.
Unless the SQL statements defined for the tasks can be optimized (either by rewriting the statements or using stored procedures), then this would be the expected average run time for the task or task graph. Choose the right size for the warehouse based on your analysis to ensure the task or task graph finishes running within this window.
The following diagram shows a window of 1 minute in which a single task queued for 20 seconds and then ran for 40 seconds.
The following diagram shows a task graph that requires 5 minutes on average to complete for each run. The diagram shows the window for 2 runs of the task graph to complete. This window is calculated from the time the root task is scheduled to start until the last child task in the task graph has completed running. In this example, the task graph is shared with other, concurrent operations that queue while each of the 3 tasks in the task graph are running. These concurrent operations consume all available resources when each task in the task graph finishes running and before the next task starts running. As a result, the window for each task includes some amount of queuing while it waits for other operations to finish and relinquish compute resources.
Note that even if this task graph ran on a dedicated warehouse, a brief lag would be expected after a predecessor task finishes running and any child task is executed; however, no queueing for shared resources with other operations would occur. The warehouse size you choose should be large enough to accommodate multiple child tasks that are triggered simultaneously by predecessor tasks.
Recommendations for choosing a compute model¶
The following table describes various factors that can help you decide when to use serverless tasks versus user-managed tasks:
Number, duration, and predictability of concurrent task workloads
Recommended when you cannot fully utilize a warehouse because too few tasks run concurrently or they run to completion quickly (in less than 1 minute).
Because the size of compute resources chosen is based on the history of previous runs, tasks with relatively stable runs are good candidates for serverless tasks.
Recommended when you can fully utilize a single warehouse by scheduling multiple concurrent tasks to take advantage of available compute resources.
For serverless tasks, Snowflake bills your account based on the actual compute resource usage.
In contrast, billing for user-managed warehouses is based on warehouse size, with a 60-second minimum each time the warehouse is resumed, regardless of the compute resources used.
Recommended when adherence to the schedule interval is highly important.
If a run of a standalone task or scheduled task graph exceeds nearly all of this interval, Snowflake increases the size of the compute resources (to a maximum of the equivalent of a 2X-Large warehouse).
Recommended when adherence to the schedule interval is less important.
Schedule interval refers to the interval of time between successive scheduled executions of a standalone task or the root task in a task graph.
Note that increasing the compute resources reduces the execution time of some, but not all, SQL code and might not be sufficient to ensure a task run is completed within the batch window.
Note that the maximum size for a serverless task run is equivalent to an XXLARGE warehouse. If a task workload requires a larger warehouse, create a user-managed task that references a warehouse of the required size.
A standalone task or the root task in a task graph generally runs on a schedule. You can define the schedule when creating a task (using CREATE TASK) or later (using ALTER TASK).
Snowflake ensures only one instance of a task with a schedule (such as a standalone task or the root task in a task graph) is executed at a given time. If a task is still running when the next scheduled execution time occurs, then that scheduled time is skipped.
Snowflake automatically resizes and scales the compute resources for serverless tasks. For tasks that rely on a warehouse to provide compute resources, choose an appropriate warehouse size for a given task to complete its workload within the defined schedule. For information, see Choosing a Warehouse Size (in this topic).
Task scheduling and daylight saving time¶
The cron expression in a task definition supports specifying a time zone. A scheduled task runs according to the specified cron expression in the local time for a given time zone. Special care should be taken with regard to scheduling tasks for time zones that recognize daylight saving time. Tasks scheduled during specific times on days when the transition from standard time to daylight saving time (or the reverse) occurs can have unexpected behaviors.
During the autumn change from daylight saving time to standard time, a task scheduled to start at 1 AM in the America/Los_Angeles time zone (i.e.
0 1 * * * America/Los_Angeles) would run twice: once at 1 AM and then again when 1:59:59 AM shifts to 1:00:00 AM local time. That is, there are two points in time when the local time is 1 AM.
During the spring change from standard time to daylight saving time, a task scheduled to start at 2 AM in the America/Los_Angeles time zone (i.e.
0 2 * * * America/Los_Angeles) would not run at all because the local time shifts from 1:59:59 AM to 3:00:00 AM. That is, there is no point during that day when the local time is 2 AM.
To avoid unexpected task executions due to daylight saving time, use one of the following:
Do not schedule tasks to run at a specific time between 1 AM and 3 AM (daily, or on days of the week that include Sundays), or
Manually adjust the cron expression for tasks scheduled during those hours twice each year to compensate for the time change due to daylight saving time, or
Use a time format that does not apply daylight savings time, such as UTC.
A task graph, or Directed Acyclic Graph (DAG), is a series of tasks composed of a single root task and additional tasks, organized by their dependencies. DAGs flow in a single direction, meaning a task later in the series cannot prompt the run of an earlier task. Each task (except the root task) can have multiple predecessor tasks (dependencies). Each task can also have multiple subsequent (child) tasks that depend on it. A task runs only after all of its predecessor tasks have run successfully to completion.
The root task should have a defined schedule that initiates a run of the task graph. Each of the other tasks has at least one defined predecessor to link the tasks in the task graph. A child task runs only after all of its predecessor tasks run successfully to completion.
You can specify the predecessor tasks when creating a new task (using CREATE TASK … AFTER) or later (using ALTER TASK … ADD AFTER). A task graph is limited to a maximum of 1000 tasks total (including the root task). A single task can have a maximum of 100 predecessor tasks and 100 child tasks.
You can view your task graphs using SQL or Snowsight. To view task graphs in Snowsight, see Viewing Individual Task Graphs.
In the following basic example, the root task prompts Tasks B and C to run simultaneously. Task D runs when both Tasks B and C have completed their runs.
The following practical example shows how a task graph could be used to update dimension tables in a sales database before aggregating fact data:
A further example shows the concluding task in a task graph calling an external function to trigger a remote messaging service to send a notification that all previous tasks have run successfully to completion.
All tasks in a task graph must have the same task owner (a single role must have the OWNERSHIP privilege on all of the tasks) and be stored in the same database and schema.
Transferring ownership of a task severs the dependency between this task and any predecessor and child tasks. For more information, see Link Severed Between Predecessor and Child Tasks (in this topic).
When ownership of all tasks in a task graph is transferred at once, through either of the following activities, the relationships between all tasks in the task graph are retained:
The current owner of all tasks that comprise the task graph is dropped (using DROP ROLE). Ownership of the objects owned by the dropped role is transferred to the role that executes the DROP ROLE command.
Ownership of all tasks that comprise the task graph is explicitly transferred to another role (e.g. by executing GRANT OWNERSHIP on all tasks in a schema).
Overlapping task graph runs¶
By default, Snowflake ensures that only one instance of a particular task graph is allowed to run at a time. The next run of a root task is scheduled only after all tasks in the task graph have finished running. This means that if the cumulative time required to run all tasks in the task graph exceeds the explicit scheduled time set in the definition of the root task, at least one run of the task graph is skipped. The behavior is controlled by the ALLOW_OVERLAPPING_EXECUTION parameter on the root task; the default value is FALSE. Setting the parameter value to TRUE permits task graph runs to overlap.
In addition, a child task begins its run only after all predecessor tasks for the child task have successfully completed their own runs. A task that executes time-intensive SQL operations delays the start of any child task that identifies the task as a predecessor.
In the following example, a task graph run is scheduled to start when a prior run has not completed yet. The period of overlap, or concurrency, is identified in red. The diagram also identifies the span of time when each task is queued before running in the user-managed warehouse. Note that if you use serverless compute resources, there is no queuing period:
Overlapping runs may be tolerated (or even desirable) when read/write SQL operations executed by overlapping runs of a task graph do not produce incorrect or duplicate data. However, for other task graphs, task owners (the role with the OWNERSHIP privilege on all tasks in the task graph) should set an appropriate schedule on the root task and choose an appropriate warehouse size (or use serverless compute resources) to ensure an instance of the task graph finishes to completion before the root task is next scheduled to run.
To better align a task graph with the schedule defined in the root task:
If feasible, increase the scheduling time between runs of the root task.
Consider modifying compute-heavy tasks to use serverless compute resources. If the task relies on user-managed compute resources, increase the size of the warehouse that runs large or complex SQL statements or stored procedures in the task graph.
Analyze the SQL statements or stored procedure executed by each task. Determine if code could be rewritten to leverage parallel processing.
If none of the above solutions help, consider whether it is necessary to allow concurrent runs of the task graph by setting ALLOW_OVERLAPPING_EXECUTION = TRUE on the root task. This parameter can be defined when creating a task (using CREATE TASK) or later (using ALTER TASK).
Viewing dependent tasks in a task graph¶
To view either the direct child tasks for a root task or all tasks in a task graph:
Query the TASK_DEPENDENTS table function (in the Snowflake Information Schema). Note that to retrieve all tasks in a task graph, input the root task when calling the function. If you input a child task, the function returns the children (and the children of those children, etc.) of the specified task.
Task graph runs with suspended child tasks¶
When the root task is suspended, you can resume or suspend any child tasks using ALTER TASK … RESUME | SUSPEND. Resuming any suspended child tasks is not required before you resume the root task.
When a task graph runs with one or more suspended child tasks, the run ignores those tasks. A child task with multiple predecessors runs as long as at least one of the predecessors is in a resumed state, and all resumed predecessors run successfully to completion.
Resuming suspended tasks in a task graph¶
To recursively resume all tasks in a task graph, query the SYSTEM$TASK_DEPENDENTS_ENABLE function rather than resuming each task individually (using ALTER TASK … RESUME).
A finalizer task handles the release and cleanup of resources that a task graph uses. The finalizer task is guaranteed to run if the Task Graph is executed and ensures proper resource cleanup and completion of necessary steps in all scenarios. For example, if a task graph run uses intermediate tables to track data for processing and fails before the table rows are consumed, the next run will encounter duplicate rows and reprocess data resulting in longer execution time or wasting compute resources. The finalizer task can address this issue by dropping the rows or truncating the table as needed.
The finalizer task works like any other task in a task graph, with the following differences:
A finalizer task is always associated with a root task. Each root task can only have one finalizer task, and a finalizer task can only be associated with one root task.
A finalizer task is scheduled only when no other tasks are running or queued in the current task graph run, and at least one task in the graph has begun execution. If a graph is skipped (for example, the root task is skipped), the finalizer task will not run. If ALLOW_OVERLAPPING_EXECUTION is true, the finalizer task will behave like the other tasks and will still be scheduled even if there are other ongoing task graph runs.
A finalizer task cannot have any child tasks. Any command that tries to make the finalizer task a predecessor will fail. The creation of a finalizer task must include the
FINALIZEkeyword, which is incompatible with both the
To create a finalizer task, create a task using the
FINALIZE keyword and set a relationship to the root task:
CREATE TASK <TASK_NAME> ...
... FINALIZE = <ROOT_TASK_NAME>
For more information, see CREATE TASK.
In Snowsight, a finalizer task shows as a separate task with its own run history and configuration details. A task graph view doesn’t show the finalizer task or the relationship between the root task and the finalizer task.
Versioning of runs¶
When a standalone task or the root task in a task graph is first resumed (or manually executed using EXECUTE TASK), an initial version of the task is set. If the task is a root task, then a version of the entire task graph, including all properties for all tasks in the task graph, is set. The standalone task or task graph runs using this version. After a task is suspended and modified, a new version is set when the standalone or root task is resumed or manually executed.
To modify or recreate any task in a task graph, the root task must first be suspended (using ALTER TASK … SUSPEND). When the root task is suspended, all future scheduled runs of the root task are cancelled; however, if any tasks are currently running (i.e, the tasks in an EXECUTING state), these tasks and any descendent tasks continue to run using the current version.
If the definition of a stored procedure called by a task changes while the task graph is executing, the new programming could be executed when the stored procedure is called by the task in the current run.
For example, suppose the root task in a task graph is suspended, but a scheduled run of this task has already started. The owner of all tasks in the task graph modifies the SQL code called by a child task while the root task is still running. The child task runs and executes the SQL code in its definition using the version of the task graph that was current when the root task started its run. When the root task is resumed or is manually executed, a new version of the task graph is set. This new version includes the modifications to the child task.
Setting session parameters for tasks¶
You can set session parameters for the session in which a task runs. To do so, modify an existing task and set the desired parameter values (using ALTER TASK …
SET session_parameter = value[, session_parameter = value ... ]).
A task supports all session parameters. For the complete list, see Parameters.
Note that a task does not support account or user parameters.
Automatically suspend tasks after failed runs¶
Optionally suspend tasks automatically after a specified number of consecutive runs that either fail or time out. This feature can reduce costs by suspending tasks that consume Snowflake credits but fail to run to completion. Failed task runs include runs in which the SQL code in the task body either produces a user error or times out. Task runs that are skipped, canceled, or that fail due to a system error are considered indeterminate and are not included in the count of failed task runs.
SUSPEND_TASK_AFTER_NUM_FAILURES = num parameter on a standalone task or the root task in a task graph. When the parameter
is set to a value greater than
0, the following behavior applies to runs of the standalone task or the root task in a task graph:
Standalone tasks are automatically suspended after the specified number of consecutive task runs either fail or time out.
The root task is automatically suspended after any child task in the task graph consecutively fails or times out the specified number of times. The child task that fails or times out is not suspended.
The SUSPEND_TASK_AFTER_NUM_FAILURES parameter can also be set at the account, database, or schema level. The setting applies to all standalone or root tasks contained in the modified object. Note that explicitly setting the parameter at a lower level overrides the parameter value set at a higher level.
Manually executing tasks¶
The EXECUTE TASK command manually triggers a single run of a scheduled task (either a standalone task or the root task in a task graph) independent of the schedule defined for the task. A successful run of a root task triggers a cascading run of child tasks in the task graph as their precedent task completes, as though the root task had run on its defined schedule.
This SQL command is useful for testing new or modified standalone tasks and task graphs before you enable them to execute SQL code in production.
Call this SQL command directly in scripts or in stored procedures. In addition, this command supports integrating tasks in external data pipelines. Any third-party services that can authenticate into your Snowflake account and authorize SQL actions can execute the EXECUTE TASK command to run tasks.
Viewing the task history for your account¶
You can view the task history for your account using SQL or Snowsight. To view task history in Snowsight, refer to Viewing Task History in Snowsight.
The following roles (or roles with the specified privileges) can use SQL to view the task history within a specified date range:
Account administrator (i.e. users with the ACCOUNTADMIN role).
Task owner (i.e. role that has the OWNERSHIP privilege on a task).
Any role that has the global MONITOR EXECUTION privilege.
To view the run history for a single task:
To view details on a task graph run that is currently scheduled or is executing:
To view the history for task graph runs that executed successfully, failed, or were cancelled in the past 60 minutes:
The costs associated with running a task to execute SQL code differ depending on the source of the compute resources for the task:
- User-managed warehouse
Snowflake bills your account for credit usage based on warehouse usage while a task is running, similar to the warehouse usage for executing the same SQL statements in a client or the Snowflake web interface. Per-second credit billing and warehouse auto-suspend give you the flexibility to start with larger warehouse sizes and then adjust the size to match your task workloads.
- Serverless compute model
Snowflake bills your account based on the actual compute resource usage; in contrast with customer-managed virtual warehouses, which consume credits when active, and may sit idle or be overutilized. Charges are calculated based on total usage of the resources (including cloud service usage) measured in compute-hours credit usage.
Snowflake analyzes task runs in the task history dynamically to determine the ideal size of the serverless compute resources, and suspends these compute resources to save costs. Snowflake manages load capacity, ensuring optimal compute resources to meet demand. To recover the management costs of serverless compute resources, Snowflake applies a 1.2x multiplier to resource consumption. The serverless compute model can still reduce compute costs over user-managed warehouses; in some cases significantly.
To learn how many credits are consumed by tasks, refer to the “Serverless Feature Credit Table” in the Snowflake service consumption table.
Billing is similar to other Snowflake features such as Automatic Clustering of tables, Replication and Failover/Failback, and Snowpipe.
To retrieve the current credit usage for a specific task, query the SERVERLESS_TASK_HISTORY table function. Execute the following statement as the task owner:
SET num_credits = (SELECT SUM(credits_used) FROM TABLE(<database_name>.information_schema.serverless_task_history( date_range_start=>dateadd(D, -1, current_timestamp()), date_range_end=>dateadd(D, 1, current_timestamp()), task_name => '<task_name>') ) );
Name of the database that contains the task.
Name of the task.
To retrieve the current credit usage for all serverless tasks, query the SERVERLESS_TASK_HISTORY view. Execute the following statement as an account administrator:
ORDER BY start_time, task_id;
Understanding the system service¶
Snowflake runs tasks with the privileges of the task owner (i.e. the role that has OWNERSHIP privilege on the task), but task runs are not associated with a user. Instead, each run is executed by a system service. Tasks are decoupled from specific users to avoid complications that can arise when users are dropped, locked due to authentication issues, or have roles removed.
Tasks execute with the privileges of the task owner whether scheduled or run manually by EXECUTE TASK by another role with OPERATE privilege.
Because task runs are decoupled from a user, the query history for task runs are associated with the system service. SYSTEM is not a user in the account; it is a behind-the-scenes service. As such, there are no user credentials for this service, and no individual (from Snowflake or in your account) can assume its identity. Activity for the system service is limited to your account. The same encryption protections and other security protocols are built into this service as are enforced for other operations.
To support creating and managing tasks, Snowflake provides the following set of special DDL commands:
In addition, providers can view, grant, or revoke access to the necessary database objects for ELT using the following standard access control DDL:
To support retrieving information about tasks, Snowflake provides the following set of SQL functions:
Access control privileges¶
Creating tasks requires a role with a minimum of the following privileges:
EXECUTE MANAGED TASK
Required only for tasks that rely on serverless compute resources.
USAGE, CREATE TASK
Required only for tasks that rely on user-managed warehouses for compute resources.
After a task is created, the task owner must have the following privileges for the task to run:
Required to run any tasks the role owns. Revoking the EXECUTE TASK privilege on a role prevents all subsequent task runs from starting under that role.
EXECUTE MANAGED TASK
Required only for tasks that rely on serverless compute resources.
Required only for tasks that rely on user-managed warehouses for compute resources.
In addition, the role must have the permissions required to run the SQL statement executed by the task.
Resuming or suspending tasks¶
In addition to the task owner, a role that has the OPERATE privilege on the task can suspend or resume the task. This role must have the USAGE privilege on the database and schema that contain the task. No other privileges are required.
When a task is resumed, Snowflake verifies that the task owner role has the privileges listed in Owning Tasks (in this topic).
Creating a task administrator role¶
For ease of use, we recommend creating a custom role (e.g.
taskadmin) and assigning the EXECUTE TASK privilege to this role. Any role that
can grant privileges (e.g. SECURITYADMIN or any role with the MANAGE GRANTS privilege) can then grant this custom role to any task owner
role to allow altering their own tasks. To remove the ability for the task owner role to execute the task, it is only necessary to revoke
this custom role from the task owner role. Note that if you choose not to create this custom role, an account administrator must revoke the
EXECUTE TASK privilege from the task owner role.
For example, create a custom role name
taskadmin and grant that role the EXECUTE TASK privilege. Assign the
taskadmin role to a
task owner role named
USE ROLE securityadmin;
CREATE ROLE taskadmin;
-- set the active role to ACCOUNTADMIN before granting the account-level privileges to the new role
USE ROLE accountadmin;
GRANT EXECUTE TASK, EXECUTE MANAGED TASK ON ACCOUNT TO ROLE taskadmin;
-- set the active role to SECURITYADMIN to show that this role can grant a role to another role
USE ROLE securityadmin;
GRANT ROLE taskadmin TO ROLE myrole;
For more information on creating custom roles and role hierarchies, see Configuring access control.
Dropping a task owner role¶
When the owner role of a given task (i.e. the role with the OWNERSHIP privilege on the task) is deleted, the task is “re-possessed” by the role that dropped the owner role. This ensures that ownership moves to a role that is closer to the root of the role hierarchy. When a task is re-possessed, it is automatically paused, i.e., all executions currently in flight complete processing, but new executions will not be scheduled until the task is resumed explicitly by the new owner. The rationale for this is to prevent a user with access to a particular role from leaving behind tasks that suddenly execute with higher permissions when the role is removed.
If the role that a running task is executing under is dropped while the task is running, the task completes processing under the dropped role.
This section provides a high-level overview of the task setup workflow.
Complete the steps in Creating a Task Administrator Role (in this topic) to create a role that can be used to execute the commands in the following steps.
Create a task using CREATE TASK. The task is suspended by default.
Verify the SQL statement that you will reference in a task executes as expected before you create the task. Tasks are intended to automate SQL statements or stored procedures that have already been tested thoroughly.
Execute ALTER TASK … RESUME to allow the task to run based on the parameters specified in the task definition.