Introduction to tasks¶
Tasks use user defined functions to automate and schedule business processes. With a single task you can perform a simple to complex function in your data pipeline. With task graphs you can put multiple tasks together to create data pipelines to handle complex use cases. You can also combine tasks with table streams for continuous ELT workflows to process recently changed data.
A task can execute any one of the following types of functions:
Single SQL statement
Call to a stored procedure
Procedural logic using Snowflake Scripting
You can use Triggered Tasks with table streams for continuous ELT workflows to process recently changed table rows.
Tasks can also be used independently to generate periodic reports by inserting or merging rows into a report table or perform other periodic work. To run complex business processes with tasks, consider using task graphs to chain multiple tasks together.
Note
Table schema evolution is not supported by tasks.
Task creation workflow¶
This section provides an overview of the task setup workflow.
Create a task administrator role that can run the commands in the following steps.
Create a task using CREATE TASK. The task is suspended by default. Review the following sections for more information about creating tasks:
Test your tasks using manual task execution.
Execute ALTER TASK … RESUME to allow the task to run based on the parameters specified in the task definition. Review the following sections for more information about task runs:
Compute resources¶
Tasks require compute resources to execute SQL code. Either of the following compute models can be chosen for individual tasks:
Serverless compute model
User-managed virtual warehouse
Serverless tasks¶
The serverless compute model for tasks enables you to rely on compute resources managed by Snowflake instead of user-managed virtual warehouses. Snowflake automatically resizes serverless compute resources as required for each workload. Snowflake determines the ideal size of serverless compute resources for a given run based on a dynamic analysis of statistics for the most recent runs of the same task. The maximum compute size for a serverless task is equivalent to an XXLARGE user-managed virtual warehouse.
To use the serverless compute model, leave out the WAREHOUSE parameter when you create a task using CREATE TASK. Note that the role that executes the CREATE TASK command must have the global EXECUTE MANAGED TASK privilege. For more information about the access control requirements for tasks, see Task security.
Billing for runs of serverless tasks is different from the standard credit consumption model for tasks that rely on virtual warehouses for compute resources. For more information, see Task costs.
You can take some control over the cost and performance of serverless tasks by setting the following parameters:
SERVERLESS_TASK_MAX_STATEMENT_SIZE: the maximum size of warehouse for running on serverless tasks to prevent unexpected costs.
SERVERLESS_TASK_MIN_STATEMENT_SIZE: the minimum size of warehouse for predictable performance on serverless tasks.
TARGET_COMPLETION_INTERVAL: the desired task completion time. Snowflake automatically resizes serverless compute resources to complete tasks within the time specified in the target completion interval. When this value is not set for scheduled tasks, Snowflake resizes serverless compute resources to complete before the next scheduled execution time.
If a series of task executions complete within the value specified on TARGET_COMPLETION_INTERVAL or a factor less, Snowflake scales down serverless compute to the next size so that subsequent task executions complete within the specified TARGET_COMPLETION_INTERVAL.
If a series of task executions are a factor greater than TARGET_COMPLETION_INTERVAL, Snowflake scales up serverless compute to the next size so that subsequent task executions complete within the specified TARGET_COMPLETION_INTERVAL.
When these three settings are specified, the precedence is as follows:
SERVERLESS_TASK_MAX_STATEMENT_SIZE
SERVERLESS_TASK_MIN_STATEMENT_SIZE
TARGET_COMPLETION_INTERVAL
Note
Set the SERVERLESS_TASK_MIN_STATEMENT_SIZE and SERVERLESS_TASK_MAX_STATEMENT_SIZE parameters only on the task. Setting these at higher levels (schema, database, account) might impact the behavior of some Snowflake features.
Here are some examples of using these parameters to control the serverless tasks.
Example 1: Create a serverless task to run every hour with a target completion interval of 120 minutes:
CREATE TASK SCHEDULED_T1 SCHEDULE='USING CRON 0 * * * * America/Los_Angeles'
TARGET_COMPLETION_INTERVAL='120 MINUTE' AS SELECT 1;
By setting TARGET_COMPLETION_INTERVAL with a relatively long duration, this indicates it’s acceptable for the task execution to run longer. Snowflake automatically sizes serverless compute so that the task execution completes within the specified TARGET_COMPLETION_INTERVAL.
Example 2: Create a serverless task to run every hour with a target completion time of 10 minutes:
CREATE TASK SCHEDULED_T2 SCHEDULE='USING CRON 0 * * * * UTC'
TARGET_COMPLETION_INTERVAL='10 MINUTE'
SERVERLESS_TASK_MAX_STATEMENT_SIZE='LARGE'
AS SELECT 1;
By setting TARGET_COMPLETION_INTERVAL with a relatively short duration, this indicates a preference for the task to quickly complete. Snowflake automatically sizes serverless compute so that the task execution completes within the specified TARGET_COMPLETION_INTERVAL until the SERVERLESS_TASK_MAX_STATEMENT_SIZE is reached.
Example 3: Create a serverless task scheduled once a day with a target completion interval and range of minimum and maximum warehouse sizes:
CREATE TASK SCHEDULED_T3 SCHEDULE='USING CRON 0 0 * * * UTC'
TARGET_COMPLETION_INTERVAL='180 M'
SERVERLESS_TASK_MIN_STATEMENT_SIZE='MEDIUM' SERVERLESS_TASK_MAX_STATEMENT_SIZE='LARGE' AS SELECT 1;
With this configuration, the serverless task runs once a day at midnight on at least a medium warehouse with a TARGET_COMPLETION_INTERVAL of 3 hours. Snowflake automatically sizes serverless compute so that the task execution completes within the specified TARGET_COMPLETION_INTERVAL until the SERVERLESS_TASK_MAX_STATEMENT_SIZE is reached.
Example 4: Create a serverless task scheduled once a day without a target completion interval:
CREATE TASK SCHEDULED_T4 SCHEDULE='USING CRON 0 0 * * * UTC'
SERVERLESS_TASK_MAX_STATEMENT_SIZE='LARGE' AS SELECT 1;
With this configuration, the serverless task runs once a day at midnight. If a series of task execution durations don’t complete before the next scheduled time, Snowflake automatically sizes serverless compute so that the task execution completes within the specified SCHEDULE until the SERVERLESS_TASK_MAX_STATEMENT_SIZE is reached.
For more information about the parameters and syntax, see CREATE TASK and ALTER TASK.
User-managed tasks¶
You can alternatively manage the compute resources for individual tasks by specifying an existing virtual warehouse when you create the task. This option requires that you choose a warehouse that is sized appropriately for the SQL actions that are executed by the task.
Choosing a warehouse size¶
If you choose to use existing warehouses to supply the compute resources for individual tasks, follow the best practices described in Warehouse considerations. To understand compute needs for your task, analyze the average run time for a single task or task graph using different warehouses based on warehouse size and clustering. You should also consider whether the warehouse is shared by multiple processes.
To analyze the average run time of your tasks, query the TASK_HISTORY account usage view. The average difference between the scheduled and completed times for a task is the expected average run time for the task. This difference includes the time the task was queued while other processes are using the compute resources in the warehouse.
For task graphs, there is a brief lag after a predecessor task finishes running and any child task starts. Choose a warehouse size large enough to accommodate multiple child tasks running simultaneously.
The following diagram shows a window of 1 minute in which a single task queued for 20 seconds and then ran for 40 seconds. This means that other processes were using the warehouse resources for the first 20 seconds after the task was scheduled.
The following diagram shows a task graph that requires 5 minutes on average to complete for each run. The diagram shows the window for 2 runs of the task graph to complete. This window is calculated from the time the root task is scheduled to start until the last child task has completed running.
In this example, the warehouse the task graph is running on is shared with other concurrent operations. These concurrent operations consume all available resources when each task in the task graph finishes running and before the next task starts running. As a result, the window for each task includes some amount of queuing while it waits for other operations to finish and free up compute resources.
Recommendations for choosing a compute model¶
The following table describes various factors that can help you decide when to use serverless tasks versus user-managed tasks:
Category |
Serverless Tasks |
User-managed Tasks |
Notes |
---|---|---|---|
Number, duration, and predictability of concurrent task workloads |
Recommended for under-utilized warehouses with too few tasks running concurrently, or completing quickly. Tasks with relatively stable runs are good candidates for serverless tasks. |
Recommended for fully utilized warehouses with multiple concurrent tasks. Also recommended for unpredictable loads on compute resources. Multi-cluster warehouses with auto-suspend and auto-resume enabled could help moderate your credit consumption. |
For serverless tasks, Snowflake bills your account based on the actual compute resource usage. For user-managed tasks, billing for warehouses is based on warehouse size, with a 60-second minimum each time the warehouse is resumed. |
Schedule interval |
Recommended when adherence to the schedule interval is highly important. If a run of a standalone task or scheduled task graph exceeds the interval, Snowflake increases the size of the compute resources. |
Recommended when adherence to the schedule interval is less important. |
Schedule interval refers to the interval of time between scheduled executions of a standalone task or the root task in a task graph. Increasing the compute resources can reduce the execution time of some, but not all, SQL code but doesn’t ensure a task run is completed within the batch window. |
The maximum size for a serverless task run is equivalent to an XXLARGE warehouse. If a task workload requires a larger warehouse, create a user-managed task with a warehouse of the required size.
Running tasks¶
This section describes the different ways that a task can be scheduled and run, how task failures are handled, and how the version of a task is determined.
Task scheduling¶
Tasks generally run on a schedule. You can define the schedule when creating a task using CREATE TASK or later using ALTER TASK.
A standalone task or the root task in a task graph generally runs on a schedule. You can define the schedule when creating a task using CREATE TASK or later using ALTER TASK.
Snowflake ensures only one instance of a task with a schedule is executed at a time. If a task is still running when the next scheduled execution time occurs, then that scheduled time is skipped.
Snowflake automatically resizes and scales the compute resources for serverless tasks. For user-managed tasks, choose an appropriate warehouse size for the task to complete its workload within the schedule. For information, see Choosing a Warehouse Size.
Task scheduling and daylight saving time¶
The cron expression in a task definition supports specifying a time zone. Tasks scheduled when the transition from standard time to daylight saving time, or the reverse, occurs can have unexpected behaviors.
For example:
During the change from daylight saving time to standard time, a task scheduled to start at 1 AM in the America/Los_Angeles time zone (
0 1 * * * America/Los_Angeles
) would run twice. At 1 AM and then again when 1:59:59 AM shifts to 1:00:00 AM local time.During the change from standard time to daylight saving time, a task scheduled to start at 2 AM in the America/Los_Angeles time zone (
0 2 * * * America/Los_Angeles
) would not run because the local time shifts from 1:59:59 AM to 3:00:00 AM.
To avoid unexpected task executions due to daylight saving time, consider the following:
Don’t schedule tasks to run between 1 AM and 3 AM.
Manually adjust the cron expression for tasks scheduled between 1 AM and 3 AM twice each year to compensate for the time change.
Use a time format that does not apply daylight savings time, such as UTC.
Triggered Tasks¶
You can use Triggered Tasks to run only when a defined stream has new data. This eliminates the need to poll a source frequently when the availability of new data is unpredictable. It also reduces latency because data is processed immediately.
Triggered Tasks don’t use compute resources until the defined stream has data and triggers the task run.
The following are run conditions for Triggered Tasks:
When data is changed in the table that the associated stream tracks.
When the task is first resumed, to consume any data already in the stream.
Triggered Tasks automatically run a health check every 12 hours to prevent the stream from becoming stale. If no data is in the stream, Snowflake skips the run without using compute resources.
Creating Triggered Tasks¶
To create a new triggered task, omit the SCHEDULE
parameter and include the target stream in the WHEN
clause.
CREATE TASK triggeredTask WAREHOUSE = my_warehouse
WHEN system$stream_has_data('my_stream')
AS
INSERT INTO my_downstream_table
SELECT * FROM my_stream;
ALTER TASK triggeredTask RESUME;
To migrate an existing task from a scheduled task to a triggered task, unset the SCHEDULE
parameter. The existing task must have a
target stream defined in the WHEN
clause.
ALTER TASK task SUSPEND;
ALTER TASK task UNSET SCHEDULE;
ALTER TASK task RESUME;
The following are details and limitations on Triggered Tasks parameters:
By default, Triggered Tasks run at most every 30 seconds. You can modify the USER_TASK_MINIMUM_TRIGGER_INTERVAL_IN_SECONDS parameter to run more frequently, up to every 10 seconds.
The triggered task
WHEN
conditions must be based on data changing.The when conditional supports the use of
AND
andOR
conditionals. TheAND
conditional can result in a skipped task if only one of the defined streams has data.
Triggered Tasks considerations¶
The following are details about managing, configuring, and monitoring Triggered Tasks:
In the
SHOW TASKS
andDESC TASK
output, theSCHEDULE
property displaysNULL
for Triggered Tasks.In the output of the task_history view of the information_schema and account_usage schemas, the SCHEDULED_FROM column displays TRIGGER.
If the stream or table that the stream is tracking is dropped or re-created, the triggered task automatically suspends. After the table or stream is re-created, the user can run
ALTER TASK <task_name> RESUME
to resume triggered processing.
Triggered Tasks Limitations¶
The following are limitations of Triggered Tasks:
Streams on Data Shares, Directory Tables, External Tables, and Hybrid Tables are not supported.
Serverless Tasks are not supported.
Manually executing tasks¶
The EXECUTE TASK command manually triggers a single run of a task. This SQL command is useful for testing new or modified tasks before you enable them to execute SQL code in production.
You can call this SQL command directly in scripts or in stored procedures. In addition, this command supports integrating tasks in external data pipelines. Any third-party services that can authenticate into your Snowflake account and authorize SQL actions can execute the EXECUTE TASK command to run tasks.
Versioning of task runs¶
When a standalone task is first resumed or manually executed, an initial version of the task is set. The standalone task runs using this version. After a task is suspended and modified, a new version is set when the standalone task is resumed or manually executed.
When the task is suspended, all future scheduled runs of the task are cancelled; however, currently running tasks continue to run using the current version.
For example, suppose the task is suspended, but a scheduled run of this task has already started. The owner of the task modifies the SQL code called by the task while the task is still running. The task runs and executes the SQL code in its definition using the version of the task that was current when the task started its run. When the task is resumed or is manually executed, a new version of the task is set. This new version includes the modifications to the task.
To retrieve the history of task versions, query TASK_VERSIONS Account Usage view (in the SNOWFLAKE shared database).
Automatically suspend tasks after failed runs¶
Optionally suspend tasks automatically after a specified number of consecutive runs that either fail or time out. This feature can reduce costs by suspending tasks that consume Snowflake credits but fail to run to completion.
Set the SUSPEND_TASK_AFTER_NUM_FAILURES = num
parameter on a task. When the parameter
is set to a value greater than 0
, tasks are automatically suspended after the specified number of consecutive task runs either fail or time out.
The parameter can be set when creating a task using CREATE TASK or later using ALTER TASK. You can also change this value in Snowsight.
The SUSPEND_TASK_AFTER_NUM_FAILURES parameter can also be set at the account, database, or schema level. The setting applies to all tasks contained in the modified object. Note that explicitly setting the parameter at a lower level overrides the parameter value set at a higher level.
Automatically retry failed task runs¶
If any task completes in a FAILED state, Snowflake can automatically retry the task. The automatic task retry is disabled by default. To enable this feature, set TASK_AUTO_RETRY_ATTEMPTS to a value greater than 0.
Tasks that use error notifications send notifications for each failed retry attempt. For more information, see Configuring a task to send error notifications.
When you set the TASK_AUTO_RETRY_ATTEMPTS parameter value at the account, database, or schema level, the change is applied to tasks contained in the modified object during their next scheduled run.
Setting session parameters for tasks¶
You can set session parameters for the session in which a task runs. To do so, modify an existing task and set the desired parameter values
using ALTER TASK … SET session_parameter = value[, session_parameter = value ... ]
or edit the
task in Snowsight.
A task supports all session parameters. For the complete list, see Parameters. Tasks don’t support account or user parameters.
Viewing the task history for your account¶
You can view the task history for your account using SQL or Snowsight. To view task history in Snowsight, refer to Viewing tasks and task graphs in Snowsight. For information about required privileges, see Viewing task history.
To view the run history for a single task:
- SQL:
Query the TASK_HISTORY table function (in the Snowflake Information Schema).
To view details on a task graph run that is currently scheduled or is executing:
- SQL:
Query the CURRENT_TASK_GRAPHS table function (in the Snowflake Information Schema).
To view the history for task graph runs that executed successfully, failed, or were cancelled in the past 60 minutes:
- SQL:
Query the COMPLETE_TASK_GRAPHS table function (in the Snowflake Information Schema).
Query the COMPLETE_TASK_GRAPHS view view (in Account Usage).
Task costs¶
The costs associated with running a task to execute SQL code differ depending on the source of the compute resources for the task:
- User-managed warehouse
Snowflake bills your account for credit usage based on warehouse usage while a task is running, similar to the warehouse usage for executing the same SQL statements in a client or the Snowflake web interface. Per-second credit billing and warehouse auto-suspend give you the flexibility to start with larger warehouse sizes and then adjust the size to match your task workloads.
- Serverless compute model
Snowflake bills your account based on compute resource usage. Charges are calculated based on your total usage of the resources, including cloud service usage, measured in compute-hours credit usage. The compute-hours cost changes based on warehouse size and query runtime. For more information, see Serverless credit usage or Query: Total serverless task cost.
Snowflake analyzes task runs in the task history to dynamically determine the correct size and number of the serverless compute resources. As Snowflake automatically scales up and down resources to manage your task runs, the cost to run the task runs scales proportionally.
To learn how many credits are consumed by tasks, refer to the “Serverless Feature Credit Table” in the Snowflake Service Consumption Table.
Consider the following best practices to optimize for cost when you create tasks:
Set the SCHEDULE to run less frequently.
Use the auto-suspend and auto-retry parameters to prevent resource waste on failing tasks.
Create a budget and alert on spend limits for serverless features. For more information, see Monitor credit usage with budgets.
To retrieve the current credit usage for a specific task, query the SERVERLESS_TASK_HISTORY table function. Execute the following statement as the task owner, where
<database_name>
is the database that contains the task and<task_name>
is the name of the task:SET num_credits = (SELECT SUM(credits_used) FROM TABLE(<database_name>.information_schema.serverless_task_history( date_range_start=>dateadd(D, -1, current_timestamp()), date_range_end=>dateadd(D, 1, current_timestamp()), task_name => '<task_name>') ) );
To retrieve the current credit usage for all serverless tasks, query the SERVERLESS_TASK_HISTORY view. Execute the following statement as an account administrator:
SELECT start_time, end_time, task_id, task_name, credits_used, schema_id, schema_name, database_id, database_name FROM snowflake.account_usage.serverless_task_history ORDER BY start_time, task_id;
Task security¶
To get started with tasks, you must have the correct access privileges. This section describes how to manage access to tasks.
For information about task graph ownership, see Manage task graph ownership.
Access control privileges¶
Creating tasks¶
Creating tasks requires a role with a minimum of the following privileges:
Object |
Privilege |
Notes |
---|---|---|
Account |
EXECUTE MANAGED TASK |
Required only for tasks that rely on serverless compute resources. |
Database |
USAGE |
|
Schema |
USAGE, CREATE TASK |
|
Warehouse |
USAGE |
Required only for tasks that rely on user-managed warehouses. |
Running tasks¶
After a task is created, the task owner must have the following privileges for the task to run:
Object |
Privilege |
Notes |
---|---|---|
Account |
EXECUTE TASK |
Required to run any tasks the role owns. Revoking the EXECUTE TASK privilege on a role prevents all subsequent task runs from starting under that role. |
Account |
EXECUTE MANAGED TASK |
Required only for tasks that rely on serverless compute resources. |
Database |
USAGE |
|
Schema |
USAGE |
|
Task |
USAGE |
|
Warehouse |
USAGE |
Required only for tasks that rely on user-managed warehouses. |
In addition, the role must have the permissions required to run the SQL statement executed by the task.
Viewing task history¶
To view tasks, you must have one or more of the following privileges:
The ACCOUNTADMIN role
The OWNERSHIP privilege on the task
The global MONITOR EXECUTION privilege
Resuming or suspending tasks¶
In addition to the task owner, a role that has the OPERATE privilege on the task can suspend or resume the task. This role must have the USAGE privilege on the database and schema that contain the task. No other privileges are required.
When a task is resumed, Snowflake verifies that the task owner role has the privileges listed in Running tasks.
Create custom roles to manage task permissions¶
With custom roles you can easily manage permissions granted to each account or role in Snowflake. To make changes to permissions for all accounts or roles using the custom role, update the custom role. Or, revoke permissions by removing the custom role.
Create a custom role to create tasks¶
Snowflake requires different permissions to create serverless and user-managed tasks.
For example, to create user-managed tasks, create a custom role named warehouse_task_creation
and grant that role the CREATE TASK and USAGE privileges on the warehouse that the role can create tasks in.
USE SYSADMIN;
CREATE ROLE warehouse_task_creation
COMMENT = 'This role can create user-managed tasks.';
USE ACCOUNTADMIN;
GRANT CREATE TASK
ON SCHEMA schema1
TO ROLE warehouse_task_creation;
GRANT USAGE
ON WAREHOUSE warehouse1
TO ROLE warehouse_task_creation;
As an example of a role that can create serverless tasks; create a custom role named serverless_task_creation
and grant the role the CREATE TASK privilege and the account level EXECUTE MANAGED TASK privilege.
USE SYSADMIN;
CREATE ROLE serverless_task_creation
COMMENT = 'This role can create serverless tasks.';
USE ACCOUNTADMIN;
GRANT CREATE TASK
ON SCHEMA schema1
TO ROLE serverless_task_creation;
GRANT EXECUTE MANAGED TASK ON ACCOUNT
TO ROLE serverless_task_creation;
Create a custom role to administer tasks¶
Create a custom role and with the EXECUTE TASK privilege and grant this custom role to any task owner role to allow altering their own tasks. To remove the ability for the task owner role to execute the task, revoke this custom role from the task owner role.
For example, create a custom role name taskadmin
and grant that role the EXECUTE TASK privilege. Assign the taskadmin
role to a
task owner role named myrole
:
USE ROLE securityadmin;
CREATE ROLE taskadmin;
Set the active role to ACCOUNTADMIN before granting the account-level privileges to the new role
USE ROLE accountadmin;
GRANT EXECUTE TASK, EXECUTE MANAGED TASK ON ACCOUNT TO ROLE taskadmin;
Set the active role to SECURITYADMIN to show that this role can grant a role to another role
USE ROLE securityadmin;
GRANT ROLE taskadmin TO ROLE myrole;
For more information on creating custom roles and role hierarchies, see Configuring access control.
Drop a task owner role¶
When you delete the owner role of a task, the task transfers ownership to the role that dropped the owner role. When a task transfers ownership, it is automatically paused and new executions aren’t scheduled until the new owner resumes the task.
If you drop the role while the task is running, the task run completes processing under the dropped role.
System service task execution¶
Snowflake runs tasks with the privileges of the task owner, but task runs are not associated with a user. Instead, each run is executed by a system service. Tasks are decoupled from specific users to avoid complications that can arise when users are dropped, locked due to authentication issues, or have roles removed.
Because task runs are decoupled from a user, the query history for task runs are associated with the system service. As such, there are no user credentials for this service, and no individual can assume its identity. Activity for the system service is limited to your account. The same encryption protections and other security protocols are built into this service as are enforced for other operations.
Task DDL¶
To support creating and managing tasks, Snowflake provides the following set of special DDL commands:
In addition, providers can view, grant, or revoke access to the necessary database objects for ELT using the following standard access control DDL:
Task functions¶
To support retrieving information about tasks, Snowflake provides the following set of SQL functions: