Introduction to tasks¶

Tasks use user defined functions to automate and schedule business processes. With a single task you can perform a simple to complex function in your data pipeline. With task graphs you can put multiple tasks together to create data pipelines to handle complex use cases. You can also combine tasks with table streams for continuous ELT workflows to process recently changed data.

A task can execute any one of the following types of functions:

You can use Triggered Tasks with table streams for continuous ELT workflows to process recently changed table rows.

Tasks can also be used independently to generate periodic reports by inserting or merging rows into a report table or perform other periodic work. To run complex business processes with tasks, consider using task graphs to chain multiple tasks together.

Note

Table schema evolution is not supported by tasks.

Task creation workflow¶

This section provides an overview of the task setup workflow.

  1. Create a task administrator role that can run the commands in the following steps.

  2. Create a task using CREATE TASK. The task is suspended by default. Review the following sections for more information about creating tasks:

  3. Test your tasks using manual task execution.

  4. Execute ALTER TASK … RESUME to allow the task to run based on the parameters specified in the task definition. Review the following sections for more information about task runs:

Compute resources¶

Tasks require compute resources to execute SQL code. Either of the following compute models can be chosen for individual tasks:

  • Serverless compute model

  • User-managed virtual warehouse

Serverless tasks¶

The serverless compute model for tasks enables you to rely on compute resources managed by Snowflake instead of user-managed virtual warehouses. Snowflake automatically resizes serverless compute resources as required for each workload. Snowflake determines the ideal size of serverless compute resources for a given run based on a dynamic analysis of statistics for the most recent runs of the same task. The maximum compute size for a serverless task is equivalent to an XXLARGE user-managed virtual warehouse. Multiple workloads in your account share a common set of compute resources.

To use the serverless compute model, leave out the WAREHOUSE parameter when you create a task using CREATE TASK. Note that the role that executes the CREATE TASK command must have the global EXECUTE MANAGED TASK privilege. For more information about the access control requirements for tasks, see Task security.

Billing for runs of serverless tasks is different from the standard credit consumption model for tasks that rely on virtual warehouses for compute resources. For more information, see Task costs.

User-managed tasks¶

You can alternatively manage the compute resources for individual tasks by specifying an existing virtual warehouse when you create the task. This option requires that you choose a warehouse that is sized appropriately for the SQL actions that are executed by the task.

Choosing a warehouse size¶

If you choose to use existing warehouses to supply the compute resources for individual tasks, follow the best practices described in Warehouse considerations. To understand compute needs for your task, analyze the average run time for a single task or task graph using different warehouses based on warehouse size and clustering. You should also consider whether the warehouse is shared by multiple processes.

To analyze the average run time of your tasks, query the TASK_HISTORY account usage view. The average difference between the scheduled and completed times for a task is the expected average run time for the task. This difference includes the time the task was queued while other processes are using the compute resources in the warehouse.

For task graphs, there is a brief lag after a predecessor task finishes running and any child task starts. Choose a warehouse size large enough to accommodate multiple child tasks running simultaneously.

The following diagram shows a window of 1 minute in which a single task queued for 20 seconds and then ran for 40 seconds. This means that other processes were using the warehouse resources for the first 20 seconds after the task was scheduled.

Example task batch window

The following diagram shows a task graph that requires 5 minutes on average to complete for each run. The diagram shows the window for 2 runs of the task graph to complete. This window is calculated from the time the root task is scheduled to start until the last child task has completed running.

In this example, the warehouse the task graph is running on is shared with other concurrent operations. These concurrent operations consume all available resources when each task in the task graph finishes running and before the next task starts running. As a result, the window for each task includes some amount of queuing while it waits for other operations to finish and free up compute resources.

Example task graph batch window

Recommendations for choosing a compute model¶

The following table describes various factors that can help you decide when to use serverless tasks versus user-managed tasks:

Category

Serverless Tasks

User-managed Tasks

Notes

Number, duration, and predictability of concurrent task workloads

Recommended for under-utilized warehouses with too few tasks running concurrently, or completing quickly.

Tasks with relatively stable runs are good candidates for serverless tasks.

Recommended for fully utilized warehouses with multiple concurrent tasks.

Also recommended for unpredictable loads on compute resources. Multi-cluster warehouses with auto-suspend and auto-resume enabled could help moderate your credit consumption.

For serverless tasks, Snowflake bills your account based on the actual compute resource usage.

For user-managed tasks, billing for warehouses is based on warehouse size, with a 60-second minimum each time the warehouse is resumed.

Schedule interval

Recommended when adherence to the schedule interval is highly important.

If a run of a standalone task or scheduled task graph exceeds the interval, Snowflake increases the size of the compute resources.

Recommended when adherence to the schedule interval is less important.

Schedule interval refers to the interval of time between scheduled executions of a standalone task or the root task in a task graph.

Increasing the compute resources can reduce the execution time of some, but not all, SQL code but doesn’t ensure a task run is completed within the batch window.

The maximum size for a serverless task run is equivalent to an XXLARGE warehouse. If a task workload requires a larger warehouse, create a user-managed task with a warehouse of the required size.

Running tasks¶

This section describes the different ways that a task can be scheduled and run, how task failures are handled, and how the version of a task is determined.

Task scheduling¶

Tasks generally run on a schedule. You can define the schedule when creating a task using CREATE TASK or later using ALTER TASK.

A standalone task or the root task in a task graph generally runs on a schedule. You can define the schedule when creating a task using CREATE TASK or later using ALTER TASK.

Snowflake ensures only one instance of a task with a schedule is executed at a time. If a task is still running when the next scheduled execution time occurs, then that scheduled time is skipped.

Snowflake automatically resizes and scales the compute resources for serverless tasks. For user-managed tasks, choose an appropriate warehouse size for the task to complete its workload within the schedule. For information, see Choosing a Warehouse Size.

Task scheduling and daylight saving time¶

The cron expression in a task definition supports specifying a time zone. Tasks scheduled when the transition from standard time to daylight saving time, or the reverse, occurs can have unexpected behaviors.

For example:

  • During the change from daylight saving time to standard time, a task scheduled to start at 1 AM in the America/Los_Angeles time zone (0 1 * * * America/Los_Angeles) would run twice. At 1 AM and then again when 1:59:59 AM shifts to 1:00:00 AM local time.

  • During the change from standard time to daylight saving time, a task scheduled to start at 2 AM in the America/Los_Angeles time zone (0 2 * * * America/Los_Angeles) would not run because the local time shifts from 1:59:59 AM to 3:00:00 AM.

To avoid unexpected task executions due to daylight saving time, consider the following:

  • Don’t schedule tasks to run between 1 AM and 3 AM.

  • Manually adjust the cron expression for tasks scheduled between 1 AM and 3 AM twice each year to compensate for the time change.

  • Use a time format that does not apply daylight savings time, such as UTC.

Triggered Tasks¶

You can use Triggered Tasks to only run when a defined stream has new data. This simplifies a common use case for frequently polling a source with unpredictable availability of new data, and reduces latency by immediately processing when there is new data.

Triggered Tasks don’t use compute resources until the defined stream has data and triggers the task run.

The following are run conditions for Triggered Tasks:

  • When data is changed in the table that the associated stream tracks.

  • When the task is first resumed, to consume any data already in the stream.

  • Triggered Tasks automatically run a health check every 12 hours to prevent the stream from becoming stale. If there is no data in the stream, Snowflake skips the run without using compute resources.

Creating Triggered Tasks¶

To create a new triggered task, omit the SCHEDULE parameter and include the target stream in the WHEN clause.

CREATE TASK triggeredTask  WAREHOUSE = my_warehouse
  WHEN system$stream_has_data('my_stream')
  AS
    INSERT INTO my_downstream_table
    SELECT * FROM my_stream;

ALTER TASK triggeredTask RESUME;
Copy

To migrate an existing task from a scheduled task to a triggered task, unset the SCHEDULE parameter. The existing task must have a target stream defined in the WHEN clause.

ALTER TASK task SUSPEND;
ALTER TASK task UNSET SCHEDULE;
ALTER TASK task RESUME;
Copy

The following are details and limitations on Triggered Tasks parameters:

  • By default, Triggered Tasks run at most every 30 seconds. You can modify the USER_TASK_MINIMUM_TRIGGER_INTERVAL_IN_SECONDS parameter to run more frequently, up to every 15 seconds.

  • The triggered task WHEN conditions must be based on data changing.

  • The when conditional supports the use of AND and OR conditionals. The AND conditional can result in a skipped task if only one of the defined streams has data.

Triggered Tasks considerations¶

The following are details about managing, configuring, and monitoring Triggered Tasks:

  • In the SHOW TASKS and DESC TASK output, the SCHEDULE property displays NULL for Triggered Tasks.

  • In the output of the task_history view of the information_schema and account_usage schemas, the SCHEDULED_FROM column displays TRIGGER.

  • If the stream or table that the stream is tracking is dropped or re-created, the triggered task automatically suspends. After the table or stream is re-created, the user can run ALTER TASK <task_name> RESUME to resume triggered processing.

Triggered Tasks Limitations¶

The following are limitations of Triggered Tasks:

  • Streams on Data Shares, Directory Tables, External Tables, and Hybrid Tables are not supported.

  • Serverless Tasks are not supported.

Manually executing tasks¶

The EXECUTE TASK command manually triggers a single run of a task. This SQL command is useful for testing new or modified tasks before you enable them to execute SQL code in production.

You can call this SQL command directly in scripts or in stored procedures. In addition, this command supports integrating tasks in external data pipelines. Any third-party services that can authenticate into your Snowflake account and authorize SQL actions can execute the EXECUTE TASK command to run tasks.

Versioning of task runs¶

When a standalone task is first resumed or manually executed, an initial version of the task is set. The standalone task runs using this version. After a task is suspended and modified, a new version is set when the standalone task is resumed or manually executed.

When the task is suspended, all future scheduled runs of the task are cancelled; however, currently running tasks continue to run using the current version.

For example, suppose the task is suspended, but a scheduled run of this task has already started. The owner of the task modifies the SQL code called by the task while the task is still running. The task runs and executes the SQL code in its definition using the version of the task that was current when the task started its run. When the task is resumed or is manually executed, a new version of the task is set. This new version includes the modifications to the task.

To retrieve the history of task versions, query TASK_VERSIONS Account Usage view (in the SNOWFLAKE shared database).

Automatically suspend tasks after failed runs¶

Optionally suspend tasks automatically after a specified number of consecutive runs that either fail or time out. This feature can reduce costs by suspending tasks that consume Snowflake credits but fail to run to completion.

Set the SUSPEND_TASK_AFTER_NUM_FAILURES = num parameter on a task. When the parameter is set to a value greater than 0, tasks are automatically suspended after the specified number of consecutive task runs either fail or time out.

The parameter can be set when creating a task using CREATE TASK or later using ALTER TASK. You can also change this value in Snowsight.

The SUSPEND_TASK_AFTER_NUM_FAILURES parameter can also be set at the account, database, or schema level. The setting applies to all tasks contained in the modified object. Note that explicitly setting the parameter at a lower level overrides the parameter value set at a higher level.

Automatically retry failed task runs¶

If any task completes in a FAILED state, Snowflake can automatically retry the task. The automatic task retry is disabled by default. To enable this feature, set TASK_AUTO_RETRY_ATTEMPTS to a value greater than 0.

Tasks that use error notifications send notifications for each failed retry attempt. For more information, see Configuring a task to send error notifications.

When you set the TASK_AUTO_RETRY_ATTEMPTS parameter value at the account, database, or schema level, the change is applied to tasks contained in the modified object during their next scheduled run.

Setting session parameters for tasks¶

You can set session parameters for the session in which a task runs. To do so, modify an existing task and set the desired parameter values using ALTER TASK … SET session_parameter = value[, session_parameter = value ... ] or edit the task in Snowsight.

A task supports all session parameters. For the complete list, see Parameters. Tasks don’t support account or user parameters.

Viewing the task history for your account¶

You can view the task history for your account using SQL or Snowsight. To view task history in Snowsight, refer to Viewing tasks and task graphs in Snowsight. For information about required privileges, see Viewing task history.

To view the run history for a single task:

SQL:

Query the TASK_HISTORY table function (in the Snowflake Information Schema).

To view details on a task graph run that is currently scheduled or is executing:

SQL:

Query the CURRENT_TASK_GRAPHS table function (in the Snowflake Information Schema).

To view the history for task graph runs that executed successfully, failed, or were cancelled in the past 60 minutes:

SQL:

Query the COMPLETE_TASK_GRAPHS table function (in the Snowflake Information Schema).

Query the COMPLETE_TASK_GRAPHS view view (in Account Usage).

Task costs¶

The costs associated with running a task to execute SQL code differ depending on the source of the compute resources for the task:

User-managed warehouse

Snowflake bills your account for credit usage based on warehouse usage while a task is running, similar to the warehouse usage for executing the same SQL statements in a client or the Snowflake web interface. Per-second credit billing and warehouse auto-suspend give you the flexibility to start with larger warehouse sizes and then adjust the size to match your task workloads.

Serverless compute model

Snowflake bills your account based on compute resource usage. Charges are calculated based on your total usage of the resources, including cloud service usage, measured in compute-hours credit usage. The compute-hours cost changes based on warehouse size and query runtime. For more information, see Serverless credit usage or Query: Total task cost.

Snowflake analyzes task runs in the task history to dynamically determine the correct size and number of the serverless compute resources. As Snowflake automatically scales up and down resources to manage your task runs, the cost to run the task runs scales proportionally.

To learn how many credits are consumed by tasks, refer to the “Serverless Feature Credit Table” in the Snowflake Service Consumption Table.

Consider the following best practices to optimize for cost when you create tasks:

  • Set the SCHEDULE to run less frequently.

  • Use the auto-suspend and auto-retry parameters to prevent resource waste on failing tasks.

  • Create a budget and alert on spend limits for serverless features. For more information, see Monitor credit usage with budgets.

To retrieve the current credit usage for a specific task, query the SERVERLESS_TASK_HISTORY table function. Execute the following statement as the task owner, where <database_name> is the database that contains the task and <task_name> is the name of the task:

SET num_credits = (SELECT SUM(credits_used)
  FROM TABLE(<database_name>.information_schema.serverless_task_history(
    date_range_start=>dateadd(D, -1, current_timestamp()),
    date_range_end=>dateadd(D, 1, current_timestamp()),
    task_name => '<task_name>')
    )
  );
Copy

To retrieve the current credit usage for all serverless tasks, query the SERVERLESS_TASK_HISTORY view. Execute the following statement as an account administrator:

SELECT start_time,
  end_time,
  task_id,
  task_name,
  credits_used,
  schema_id,
  schema_name,
  database_id,
  database_name
FROM snowflake.account_usage.serverless_task_history
ORDER BY start_time, task_id;
Copy

Task security¶

To get started with tasks, you must have the correct access privileges. This section describes how to manage access to tasks.

For information about task graph ownership, see Manage task graph ownership.

Access control privileges¶

Creating tasks¶

Creating tasks requires a role with a minimum of the following privileges:

Object

Privilege

Notes

Account

EXECUTE MANAGED TASK

Required only for tasks that rely on serverless compute resources.

Database

USAGE

Schema

USAGE, CREATE TASK

Warehouse

USAGE

Required only for tasks that rely on user-managed warehouses.

Running tasks¶

After a task is created, the task owner must have the following privileges for the task to run:

Object

Privilege

Notes

Account

EXECUTE TASK

Required to run any tasks the role owns. Revoking the EXECUTE TASK privilege on a role prevents all subsequent task runs from starting under that role.

Account

EXECUTE MANAGED TASK

Required only for tasks that rely on serverless compute resources.

Database

USAGE

Schema

USAGE

Task

USAGE

Warehouse

USAGE

Required only for tasks that rely on user-managed warehouses.

In addition, the role must have the permissions required to run the SQL statement executed by the task.

Viewing task history¶

To view tasks, you must have one or more of the following privileges:

  • The ACCOUNTADMIN role

  • The OWNERSHIP privilege on the task

  • The global MONITOR EXECUTION privilege

Resuming or suspending tasks¶

In addition to the task owner, a role that has the OPERATE privilege on the task can suspend or resume the task. This role must have the USAGE privilege on the database and schema that contain the task. No other privileges are required.

When a task is resumed, Snowflake verifies that the task owner role has the privileges listed in Running tasks.

Create custom roles to manage task permissions¶

With custom roles you can easily manage permissions granted to each account or role in Snowflake. To make changes to permissions for all accounts or roles using the custom role, update the custom role. Or, revoke permissions by removing the custom role.

Create a custom role to create tasks¶

Snowflake requires different permissions to create serverless and user-managed tasks.

For example, to create user-managed tasks, create a custom role named warehouse_task_creation and grant that role the CREATE TASK and USAGE privileges on the warehouse that the role can create tasks in.

USE SYSADMIN;

CREATE ROLE warehouse_task_creation
  COMMENT = 'This role can create user-managed tasks.';
Copy
USE ACCOUNTADMIN;

GRANT CREATE TASK
  ON SCHEMA schema1
  TO ROLE warehouse_task_creation;
Copy
GRANT USAGE
  ON WAREHOUSE warehouse1
  TO ROLE warehouse_task_creation;
Copy

As an example of a role that can create serverless tasks; create a custom role named serverless_task_creation and grant the role the CREATE TASK privilege and the account level EXECUTE MANAGED TASK privilege.

USE SYSADMIN;

CREATE ROLE serverless_task_creation
  COMMENT = 'This role can create serverless tasks.';
Copy
USE ACCOUNTADMIN;

GRANT CREATE TASK
  ON SCHEMA schema1
  TO ROLE serverless_task_creation;
Copy
GRANT EXECUTE MANAGED TASK ON ACCOUNT
  TO ROLE serverless_task_creation;
Copy

Create a custom role to administer tasks¶

Create a custom role and with the EXECUTE TASK privilege and grant this custom role to any task owner role to allow altering their own tasks. To remove the ability for the task owner role to execute the task, revoke this custom role from the task owner role.

For example, create a custom role name taskadmin and grant that role the EXECUTE TASK privilege. Assign the taskadmin role to a task owner role named myrole:

USE ROLE securityadmin;

CREATE ROLE taskadmin;
Copy

Set the active role to ACCOUNTADMIN before granting the account-level privileges to the new role

USE ROLE accountadmin;

GRANT EXECUTE TASK, EXECUTE MANAGED TASK ON ACCOUNT TO ROLE taskadmin;
Copy

Set the active role to SECURITYADMIN to show that this role can grant a role to another role

USE ROLE securityadmin;

GRANT ROLE taskadmin TO ROLE myrole;
Copy

For more information on creating custom roles and role hierarchies, see Configuring access control.

Drop a task owner role¶

When you delete the owner role of a task, the task transfers ownership to the role that dropped the owner role. When a task transfers ownership, it is automatically paused and new executions aren’t scheduled until the new owner resumes the task.

If you drop the role while the task is running, the task run completes processing under the dropped role.

System service task execution¶

Snowflake runs tasks with the privileges of the task owner, but task runs are not associated with a user. Instead, each run is executed by a system service. Tasks are decoupled from specific users to avoid complications that can arise when users are dropped, locked due to authentication issues, or have roles removed.

Because task runs are decoupled from a user, the query history for task runs are associated with the system service. As such, there are no user credentials for this service, and no individual can assume its identity. Activity for the system service is limited to your account. The same encryption protections and other security protocols are built into this service as are enforced for other operations.

Task DDL¶

To support creating and managing tasks, Snowflake provides the following set of special DDL commands:

In addition, providers can view, grant, or revoke access to the necessary database objects for ELT using the following standard access control DDL:

Task functions¶

To support retrieving information about tasks, Snowflake provides the following set of SQL functions: