Introduction to Tasks¶
Currently, a task can execute a single SQL statement, including a call to a stored procedure.
Tasks can be combined with table streams for continuous ELT workflows to process recently changed table rows. Streams ensure exactly once semantics for new or changed data in a table.
Tasks can also be used independently to generate periodic reports by inserting or merging rows into a report table or perform other periodic work.
In this Topic:
Snowflake ensures only one instance of a task with a schedule (i.e. a standalone task or the root task in a tree of tasks) is executed at a given time. If a task is still running when the next scheduled execution time occurs, then that scheduled time is skipped.
Choosing a Warehouse¶
Suggested best practices when configuring warehouses are described in Warehouse Considerations. We recommend that you analyze the average run time for a given task or tree of tasks using a specific warehouse based on warehouse size and clustering, as well as whether or not the warehouse is shared by multiple processes or is dedicated to running this single task (or tree of tasks).
Query the TASK_HISTORY table function in the Information Schema. The average difference between the scheduled and completed times for a task is the expected average run time for the task, including any period in which the task was queued. A task is queued when other processes are currently using all of the servers in the warehouse.
Unless the SQL statements defined for the tasks can be optimized (either by rewriting the statements or using stored procedures), then this would be the expected average run time for the task (or tree of tasks). Choose the right size for the warehouse based on your analysis to ensure the task (or tree of tasks) finishes running within this window.
The following diagram shows a window of 1 minute in which a single task queued for 20 seconds and then ran for 40 seconds.
The following diagram shows a tree of tasks that requires 5 minutes on average to complete for each run. The diagram shows the window for 2 runs of the tree of tasks to complete. This window is calculated from the time the root task is scheduled to start until the last child task in the tree has completed running. In this example, the tree of tasks is shared with other, concurrent operations that queue while each of the 3 tasks in the tree is running. These concurrent operations consume all available resources when each task in the tree finishes running and before the next task starts running. As a result, the window for each task includes some amount of queuing while it waits for other operations to finish and relinquish servers.
Note that even if this tree of tasks ran on a dedicated warehouse, a brief lag would be expected after a parent task finishes running and any child task is executed; however, no queueing for shared resources with other operations would occur. The warehouse size you choose should be large enough to accommodate multiple child tasks that are triggered simultaneously by parent tasks.
Task Scheduling and Daylight Saving Time¶
The cron expression in a task definition supports specifying a time zone. A scheduled task runs according to the specified cron expression in the local time for a given time zone. Special care should be taken with regard to scheduling tasks for time zones that recognize daylight saving time. Tasks scheduled during specific times on days when the transition from standard time to daylight saving time (or the reverse) occurs can have unexpected behaviors.
During the autumn change from daylight saving time to standard time, a task scheduled to start at 1 AM in the America/Los_Angeles time zone (i.e.
0 1 * * * America/Los_Angeles) would run twice: once at 1 AM and then again when 1:59:59 AM shifts to 1:00:00 AM local time. That is, there are two points in time when the local time is 1 AM.
During the spring change from standard time to daylight saving time, a task scheduled to start at 2 AM in the America/Los_Angeles time zone (i.e.
0 2 * * * America/Los_Angeles) would not run at all because the local time shifts from 1:59:59 AM to 3:00:00 AM. That is, there is no point during that day when the local time is 2 AM.
To avoid unexpected task executions due to daylight saving time, either:
Do not schedule tasks to run at a specific time between 1 AM and 3 AM (daily, or on days of the week that include Sundays), or
Manually adjust the cron expression for tasks scheduled during those hours twice each year to compensate for the time change due to daylight saving time.
Simple Tree of Tasks¶
Users can define a simple tree-like structure of tasks that starts with a root task and is linked together by task dependencies. The current implementation supports a single path between any two nodes; i.e. an individual task can have only a single predecessor (parent) task. This differs from a Directed Acyclic Graph (DAG) structure, in which a single node can have multiple parents.
A predecessor task can be defined when creating a task (using CREATE TASK … AFTER) or later (using ALTER TASK … ADD AFTER). The root task in the tree should have a defined schedule, while each of the other tasks in the tree have a defined predecessor to link them together. When a run of a predecessor task finishes successfully, it triggers the run of any child tasks that identify this task as the predecessor in their definition.
A simple tree of tasks is limited to a maximum of 1000 tasks total (including the root task) in a resumed state. An individual task in the tree is limited to a single predecessor task; however, a task can have a maximum of 100 child tasks (i.e. other tasks that identify the task as a predecessor).
Currently, we cannot guarantee that only one instance of a task with a defined predecessor task is running at a given time.
A brief lag occurs after a parent task finishes running and any child task is executed.
Simple Tree of Tasks and Task Ownership¶
All tasks in a simple tree must have the same task owner (i.e. a single role must have the OWNERSHIP privilege on all of the tasks in the tree) and be stored in the same database and schema.
When ownership of all tasks in a tree of tasks is transferred at once, through either of the following activities, the links between all tasks in the tree are retained:
The current owner of all tasks that compose the tree of tasks is dropped (using DROP ROLE). Ownership of the objects owned by the dropped role is transferred to the role that executes the DROP ROLE command.
Ownership of all tasks that compose the tree of tasks is explicitly transferred to another role (e.g. by executing GRANT OWNERSHIP on all tasks in a schema).
Versioning of Task Tree Runs¶
When the root task starts a scheduled run, a version of the entire tree of tasks is established. All properties of all tasks in the tree are set. The entire tree of tasks completes its current run using the properties for this set version.
Before a DDL statement can be executed on any task in a tree of tasks, the root task must be suspended (using ALTER TASK … SUSPEND). When the root task is suspended, all future scheduled runs of the root task are cancelled; however, if any tasks are currently running (i.e, the tasks in an an EXECUTING state), these tasks and any descendent tasks continue to run.
After task properties in the tree are modified and the root task is resumed, those changes are not applied until the next scheduled run. At that time, a new version of the tree of tasks is set.
If the definition of a stored procedure called by a task changes while the tree of tasks is executing, the new programming could be executed when the stored procedure is called by the task in the current run.
For example, suppose the owner of the following tree of tasks, or another role with the appropriate privileges, has suspended the root task (Task A), but a scheduled run of the task has already started. The tree owner modifies the SQL statement called by Task B (a child task) while the root task is still running. A version of the entire tree is set when the root task starts its current run. Task B executes the SQL statement (or calls the stored procedure) in its definition in the version of the tree of tasks when the root task started running.
When the root task is resumed, a new version of the tree of tasks is set when the root task starts its next run. This new version includes the modification to Task B.
Setting Session Parameters for Tasks¶
You can set session parameters for the session in which a task runs. To do so, modify an existing task and set the desired parameter values (using ALTER TASK …
SET session_parameter = value[, session_parameter = value ... ]).
A task supports all session parameters. For the complete list, see Parameters.
Note that a task does not support account or user parameters.
Viewing the Task History for Your Account¶
The following roles (or roles with the specified privileges) can use SQL to view the task history within a specified date range:
Account administrator (i.e. users with the ACCOUNTADMIN role).
Task owner (i.e. role that has the OWNERSHIP privilege on a task).
Any role that has the global MONITOR EXECUTION privilege.
To view the task history:
Understanding the System Service¶
Snowflake runs tasks with the privileges of the task owner (i.e. the role that has OWNERSHIP privilege on the task), but task runs are not associated with a user. Instead, each run is executed by a system service. Tasks are decoupled from specific users to avoid complications that can arise when users are dropped, locked due to authentication issues, or have roles removed.
Because task runs are decoupled from a user, the query history for task runs are associated with the system service. SYSTEM is not a user in the account; it is a behind-the-scenes service. As such, there are no user credentials for this service, and no individual (from Snowflake or in your account) can assume its identity. Activity for the system service is limited to your account. The same encryption protections and other security protocols are built into this service as are enforced for other operations.
To support creating and managing tasks, Snowflake provides the following set of special DDL commands:
In addition, providers can view, grant, or revoke access to the necessary database objects for ELT using the following standard access control DDL:
To support retrieving information about tasks, Snowflake provides the following set of SQL functions:
Required Access Privileges¶
Creating, managing, and executing tasks requires a role with a minimum of the following role permissions:
Required to run any tasks the role owns. Revoking the EXECUTE TASK privilege on a role prevents all subsequent task runs from starting under that role.
USAGE, CREATE TASK
In addition, the role must have the permissions required to run the SQL statement executed by the task.
Creating a Task Administrator Role¶
For ease of use, we recommend creating a custom role (e.g. TASKADMIN) and assigning the EXECUTE TASK privilege to this role. Any role that can grant privileges (e.g. SECURITYADMIN or any role with the MANAGE GRANTS privilege) can then grant this custom role to any task owner role to allow altering their own tasks. To remove the ability for the task owner role to execute the task, it is only necessary to revoke this custom role from the task owner role. Note that if you choose not to create this custom role, an account administrator must revoke the EXECUTE TASK privilege from the task owner role.
For example, create a custom role name TASKADMIN and grant that role the EXECUTE TASK privilege. Assign the TASKADMIN role to a task owner role named MYROLE:
USE ROLE securityadmin; CREATE ROLE taskadmin; -- set the active role to ACCOUNTADMIN before granting the EXECUTE TASK privilege to the new role USE ROLE accountadmin; GRANT EXECUTE TASK ON ACCOUNT TO ROLE taskadmin; -- set the active role to SECURITYADMIN to show that this role can grant a role to another role USE ROLE securityadmin; GRANT ROLE taskadmin TO ROLE myrole;
For more information on creating custom roles and role hierarchies, see Configuring Access Control.
Dropping a Task Owner Role¶
When the owner role of a given task (i.e. the role with the OWNERSHIP privilege on the task) is deleted, the task is “re-possessed” by the role that dropped the owner role. This ensures that ownership moves to a role that is closer to the root of the role hierarchy. When a task is re-possessed, it is automatically paused, i.e., all executions currently in flight complete processing, but new executions will not be scheduled until the task is resumed explicitly by the new owner. The rationale for this is to prevent a user with access to a particular role from leaving behind tasks that suddenly execute with higher permissions when the role is removed.
If the role that a running task is executing under is dropped while the task is running, the task completes processing under the dropped role.
This section provides a high-level overview of the task setup workflow.
Complete the steps in Creating a Task Administrator Role (in this topic) to create a role that can be used to execute the commands in the following steps.
Create a task using CREATE TASK. The task is suspended by default.
Verify the SQL statement that you will reference in a task executes as expected before you create the task. Tasks are intended to automate SQL statements or stored procedures that have already been tested thoroughly.
Execute ALTER TASK … RESUME to allow the task to run based on the parameters specified in the task definition.