Categories:

Data Pipeline DDL

CREATE TASK

Creates a new task in the current/specified schema or replaces an existing task.

This command also supports the following variant:

  • CREATE TASK … CLONE (creates a clone of an existing task)

See also:

ALTER TASK , DROP TASK , SHOW TASKS , DESCRIBE TASK

In this Topic:

Syntax

CREATE [ OR REPLACE ] TASK [ IF NOT EXISTS ] <name>
  [ { WAREHOUSE = <string> } | { USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE = <string> } ]
  [ SCHEDULE = '{ <num> MINUTE | USING CRON <expr> <time_zone> }' ]
  [ ALLOW_OVERLAPPING_EXECUTION = TRUE | FALSE ]
  [ <session_parameter> = <value> [ , <session_parameter> = <value> ... ] ]
  [ USER_TASK_TIMEOUT_MS = <num> ]
  [ ERROR_INTEGRATION = <integration_name> ]
  [ COPY GRANTS ]
  [ COMMENT = '<string_literal>' ]
  [ AFTER <string> ]
[ WHEN <boolean_expr> ]
AS
  <sql>

Variant Syntax

CREATE TASK … CLONE

Creates a new task with the same parameter values:

CREATE [ OR REPLACE ] TASK <name> CLONE <source_task>
  [ COPY GRANTS ]
  [ ... ]

For more details, see CREATE <object> … CLONE.

Required Parameters

name

String that specifies the identifier (i.e. name) for the task; must be unique for the schema in which the task is created.

In addition, the identifier must start with an alphabetic character and cannot contain spaces or special characters unless the entire identifier string is enclosed in double quotes (e.g. "My object"). Identifiers enclosed in double quotes are also case-sensitive.

For more details, see Identifier Requirements.

sql

Any one of the following:

  • Single SQL statement

  • Call to a stored procedure

  • Procedural logic using Snowflake Scripting

    Note that currently, Snowsight and the classic web interface do not support creating or modifying tasks to use Snowflake Scripting. Instead, use SnowSQL or another command-line client.

The SQL code is executed when the task runs.

Note

  • The SQL code must be executable on its own. We highly recommend that you verify the sql executes as expected before you create the task. Tasks are intended to automate SQL statements and stored procedures that have already been tested thoroughly.

  • Serverless tasks cannot invoke the following object types:

    • UDFs (user-defined functions) that contain Java or Python code.

    • Stored procedures written in Scala (using Snowpark), or which call UDFs that contain Java or Python code.

Optional Parameters

WAREHOUSE = string or . USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE = string

WAREHOUSE = string

Specifies the virtual warehouse that that provides compute resources for task runs.

Omit this parameter to use Snowflake-managed compute resources for runs of this task. Also referred to as serverless tasks, these resources are automatically resized and scaled up or down by Snowflake as required for each workload. When a schedule is specified for a task, Snowflake adjusts the resource size to complete future runs of the task within the specified time frame. To specify the initial warehouse size for the task, set the USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE = string parameter.

USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE = string

Applied only to serverless tasks.

Specifies the size of the compute resources to provision for the first run of the task, before a task history is available for Snowflake to determine an ideal size. Once a task has successfully completed a few runs, Snowflake ignores this parameter setting.

Note that if the task history is unavailable for a given task, the compute resources revert to this initial size.

Note

If a WAREHOUSE = string parameter value is specified, then setting this parameter produces a user error.

The size is equivalent to the compute resources available when creating a warehouse (using CREATE WAREHOUSE): SMALL, MEDIUM, LARGE, etc. The largest size supported by the parameter is XXLARGE. If the parameter is omitted, the first runs of the task are executed using a medium-sized (MEDIUM) warehouse.

You can change the initial size (using ALTER TASK) after the task is created but before it has run successfully once. Changing the parameter after the first run of this task starts has no effect on the compute resources for current or future task runs.

Note that suspending and resuming a task does not remove the task history used to size the compute resources. The task history is only removed if the task is recreated (using the CREATE OR REPLACE TASK syntax).

SCHEDULE ...

Specifies the schedule for periodically running the task:

Note

  • A schedule must be defined for a standalone task or the root task in a DAG of tasks; otherwise, the task only runs if manually executed using EXECUTE TASK.

  • A schedule cannot be specified for child tasks in a DAG.

  • USING CRON expr time_zone

    Specifies a cron expression and time zone for periodically running the task. Supports a subset of standard cron utility syntax.

    For a list of time zones, see the list of tz database time zones (in Wikipedia).

    The cron expression consists of the following fields:

    # __________ minute (0-59)
    # | ________ hour (0-23)
    # | | ______ day of month (1-31, or L)
    # | | | ____ month (1-12, JAN-DEC)
    # | | | | _ day of week (0-6, SUN-SAT, or L)
    # | | | | |
    # | | | | |
      * * * * *
    

    The following special characters are supported:

    *

    Wildcard. Specifies any occurrence of the field.

    L

    Stands for “last”. When used in the day-of-week field, it allows you to specify constructs such as “the last Friday” (“5L”) of a given month. In the day-of-month field, it specifies the last day of the month.

    /n

    Indicates the nth instance of a given unit of time. Each quanta of time is computed independently. For example, if 4/3 is specified in the month field, then the task is scheduled for April, July and October (i.e. every 3 months, starting with the 4th month of the year). The same schedule is maintained in subsequent years. That is, the task is not scheduled to run in January (3 months after the October run).

    Note

    • The cron expression currently evaluates against the specified time zone only. Altering the TIMEZONE parameter value for the account (or setting the value at the user or session level) does not change the time zone for the task.

    • The cron expression defines all valid run times for the task. Snowflake attempts to run a task based on this schedule; however, any valid run time is skipped if a previous run has not completed before the next valid run time starts.

    • When both a specific day of month and day of week are included in the cron expression, then the task is scheduled on days satisfying either the day of month or day of week. For example, SCHEDULE = 'USING CRON 0 0 10-20 * TUE,THU UTC' schedules a task at 0AM on any 10th to 20th day of the month and also on any Tuesday or Thursday outside of those dates.

  • num MINUTE

    Specifies an interval (in minutes) of wait time inserted between runs of the task. Accepts positive integers only.

    Also supports num M syntax.

    To avoid ambiguity, a base interval time is set when:

    The base interval time starts the interval counter from the current clock time. For example, if an INTERVAL value of 10 is set and the task is enabled at 9:03 AM, then the task runs at 9:13 AM, 9:23 AM, and so on. Note that we make a best effort to ensure absolute precision, but only guarantee that tasks do not execute before their set interval occurs (e.g. in the current example, the task could first run at 9:14 AM, but will definitely not run at 9:12 AM).

    Note

    The maximum supported value is 11520 (8 days). Tasks that have a greater num MINUTE value never run.

ALLOW_OVERLAPPING_EXECUTION = TRUE | FALSE

Specifies whether to allow multiple instances of the DAG to run concurrently.

Note

This parameter can only be set on a root task. The setting applies to all tasks in the DAG.

The parameter can be set on standalone tasks but does not affect the task behavior. Snowflake ensures only one instance of a standalone task is running at a given time.

  • TRUE ensures only one instance of a root task is running at a given time. If a root task is still running when the next scheduled run time occurs, then that scheduled time is skipped. This guarantee does not extend to child tasks. If the next scheduled run of the root task occurs while the current run of a child task is still in operation, another instance of the DAG begins.

  • FALSE ensures only one instance of a particular DAG is allowed to run at a time. The next run of a root task is scheduled only after all child tasks in the DAG have finished running. This means that if the cumulative time required to run all tasks in the DAG exceeds the explicit scheduled time set in the definition of the root task, at least one run of the DAG is skipped.

Default: FALSE

session_parameter = value [ , session_parameter = value ... ]

Specifies a comma-separated list of session parameters to set for the session when the task runs. A task supports all session parameters. For the complete list, see Session Parameters.

USER_TASK_TIMEOUT_MS = num

Specifies the time limit on a single run of the task before it times out (in milliseconds).

Note

  • Before you increase the time limit on a task significantly, consider whether the SQL statement initiated by the task could be optimized (either by rewriting the statement or using a stored procedure) or the warehouse size should be increased.

  • In some situations, the parameter STATEMENT_TIMEOUT_IN_SECONDS has higher precedence than USER_TASK_TIMEOUT_MS. For details, see STATEMENT_TIMEOUT_IN_SECONDS.

Values: 0 - 86400000 (1 day).

Default: 3600000 (1 hour)

ERROR_INTEGRATION = 'integration_name'

Required only when configuring a task to send error notifications using Amazon Simple Notification Service (SNS). Specifies the name of the notification integration used to communicate with Amazon SNS. For more information, see Enabling Error Notifications for Tasks.

Note

Supported as a preview feature.

Important

To try this feature, you must first enable the 2022_03 behavior change release bundle (Snowflake Community) in your account. For information about managing bundles, see Behavior Change Release Management.

COPY GRANTS

Specifies to retain the access permissions from the original task when a new task is created using any of the following CREATE TASK variants:

  • CREATE OR REPLACE TASK

  • CREATE TASK … CLONE

The parameter copies all permissions, except OWNERSHIP, from the existing task to the new task. By default, the role that executes the CREATE TASK command owns the new task.

Note:

  • If the CREATE TASK statement references more than one task (e.g. create or replace task t1 clone t2;), the COPY GRANTS clause gives precedence to the task being replaced.

  • The SHOW GRANTS output for the replacement task lists the grantee for the copied privileges as the role that executed the CREATE TASK statement, with the current timestamp when the statement was executed.

  • The operation to copy grants occurs atomically in the CREATE TASK command (i.e. within the same transaction).

Note

This parameter is not supported currently.

COMMENT = 'string_literal'

Specifies a comment for the task.

Default: No value

AFTER string

Specifies a single predecessor task for the current task. Support for identifying multiple predecessor tasks in a CREATE TASK statement is not available yet.

To define a DAG of tasks, a series of tasks that start with a scheduled root task and are linked together by their dependencies, it is currently necessary to execute an ALTER TASK … ADD AFTER statement for each additional predecessor task. For an example, see Examples (in this topic). Note that all predecessor tasks can be added later; identifying a single predecessor task in the CREATE TASK statement is not required.

When all predecessor tasks finish their runs successfully, this task begins its current run (after a brief lag).

Note

  • The root task should have a defined schedule; each of the child tasks have one or more defined predecessor tasks (i.e. tasks specified using the AFTER parameter) to link them together.

  • A task is limited to 100 predecessor tasks and 100 child tasks. In addition, a DAG is limited to a maximum of 1000 tasks total (including the root task) in either a resumed or suspended state.

  • All tasks in a DAG must have the same task owner (i.e. a single role must have the OWNERSHIP privilege on all of the tasks in the DAG).

  • All tasks in a DAG must exist in the same schema.

  • The root task must be suspended before any task is recreated (using the CREATE OR REPLACE TASK syntax) or a child task is added (using CREATE TASK … AFTER).

  • If any task in a DAG is cloned, the role that clones the task becomes the owner of the clone by default.

    • If the owner of the original task creates the clone, then the task clone retains the link between the task and the predecessor task. This means the same predecessor task triggers both the original task and the task clone.

    • If another role creates the clone, then the task clone can have a schedule but not a predecessor.

  • Accounts are currently limited to a maximum of 10000 resumed tasks.

  • Current limitations:

    • Snowflake guarantees that at most one instance of a task with a defined schedule is running at a given time; however, we cannot provide the same guarantee for tasks with a defined predecessor task.

WHEN boolean_expr

Specifies a Boolean SQL expression; multiple conditions joined with AND/OR are supported. When a task is triggered (based on its SCHEDULE or AFTER setting), it validates the conditions of the expression to determine whether to execute. If the conditions of the expression are not met, then the task skips the current run. Any tasks that identify this task as a predecessor also do not run.

SYSTEM$STREAM_HAS_DATA is the only function supported for evaluation in the SQL expression. This function indicates whether a specified stream contains change tracking data. A task evaluates whether the specified stream contains change data before starting the current run. If the result is FALSE, then the task does not run. Note that this function is designed to avoid false negatives (i.e. returning a false value even when the stream contains change data); however, the function is not guaranteed to avoid false positives (i.e. returning a true value when the stream contains no change data).

Validating the conditions of the WHEN expression does not require compute resources. The validation is instead processed in the cloud services layer. A nominal charge accrues each time a task evaluates its WHEN condition and does not run. The charges accumulate each time the task is triggered until it runs. At that time, the charge is converted to Snowflake credits and added to the compute resource usage for the task run.

Generally the compute time to validate the condition is insignificant compared to task execution time. As a best practice, align scheduled and actual task runs as closely as possible. Avoid task schedules that are wildly out of synch with actual task runs. For example, if data is inserted into a table with a stream roughly every 24 hours, do not schedule a task that checks for stream data every minute. The charge to validate the WHEN expression with each run is generally insignificant, but the charges are cumulative.

Note that daily consumption of cloud services that falls below the 10% quota of the daily usage of the compute resources accumulates no cloud services charges.

Access Control Requirements

A role used to execute this SQL command must have the following privileges at a minimum:

Privilege

Object

Notes

EXECUTE MANAGED TASK

Account

Required only for tasks that rely on Snowflake-managed compute resources for runs (serverless tasks).

CREATE TASK

Schema

USAGE

Warehouse

Required only for tasks that rely on user-managed warehouses for runs.

Note that operating on any object in a schema also requires the USAGE privilege on the parent database and schema.

For instructions on creating a custom role with a specified set of privileges, see Creating Custom Roles.

For general information about roles and privilege grants for performing SQL actions on securable objects, see Access Control in Snowflake.

Usage Notes

  • Tasks run using the privileges granted to the task owner (i.e. the role that has the OWNERSHIP privilege on the task). For the list of minimum required privileges to run tasks, see Task Security.

    We recommend that you execute a SQL statement or call a stored procedure before you include it in a task definition. Complete this step as the task owner role to ensure the role has all the required privileges on any objects referenced by the SQL.

  • The compute resources for individual runs of a task are either managed by Snowflake (i.e. serverless tasks) or a user-specified virtual warehouse. To use serverless tasks, omit the WAREHOUSE = string parameter in the CREATE TASK statement.

    Note

    Serverless tasks are supported as a preview feature.

    • Snowflake-managed resources for a task can range from the equivalent of XSMALL to XXLARGE in warehouse sizes. If you would like to include larger warehouse sizes in this range, contact Snowflake Support to request a size increase.

    • The serverless model can be enabled for one or more tasks in a DAG. Runs of individual tasks in a DAG can rely on either Snowflake- or user-managed compute resources. Enabling this serverless compute model for all tasks in the DAG is not required.

  • After creating a task, you must execute ALTER TASK … RESUME before the task will run based on the parameters specified in the task definition. Note that accounts are currently limited to a maximum of 10000 resumed tasks.

    In addition, when a task is cloned, execution of the cloned task is suspended by default and must be enabled explicitly using the same command.

  • By default, a DML statement executed without explicitly starting a transaction is automatically committed on success or rolled back on failure at the end of the statement. This behavior is called autocommit and is controlled with the AUTOCOMMIT parameter. This parameter must be set to TRUE. If the AUTOCOMMIT parameter is set to FALSE at the account level, then set the parameter to TRUE for the individual task (using ALTER TASK … SET AUTOCOMMIT = TRUE); otherwise, any DML statement executed by the task fails.

  • Multiple tasks that consume change data from a single table stream retrieve different deltas. When a task consumes the change data in a stream using a DML statement, the stream advances the offset. The change data is no longer available for the next task to consume. Currently, we recommend that only a single task consumes the change data from a stream. Multiple streams can be created for the same table and consumed by different tasks.

  • When the CREATE OR REPLACE syntax is used, the existing task is dropped and recreated using the specified definition. Note the following behaviors:

    • The recreated task is suspended by default.

    • Any current run of the task (i.e. a run with an EXECUTING state in the TASK_HISTORY output) is completed. To abort the run of the specified task, execute the SYSTEM$USER_TASK_CANCEL_ONGOING_EXECUTIONS function.

    • If a standalone or root task is recreated, the next scheduled run of the task is cancelled.

  • Regarding metadata:

    Attention

    Customers should ensure that no personal data (other than for a User object), sensitive data, export-controlled data, or other regulated data is entered as metadata when using the Snowflake service. For more information, see Metadata Fields in Snowflake.

Examples

Single SQL Statement

Create a serverless task that queries the current timestamp every hour starting at 9 AM and ending at 5 PM on Sundays (America/Los_Angeles time zone).

The initial warehouse size is XSMALL:

CREATE TASK t1
  SCHEDULE = 'USING CRON 0 9-17 * * SUN America/Los_Angeles'
  USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE = 'XSMALL'
AS
SELECT CURRENT_TIMESTAMP;

Same as the previous example, but the task relies on a user-managed warehouse to provide the compute resources for runs:

CREATE TASK mytask_hour
  WAREHOUSE = mywh
  SCHEDULE = 'USING CRON 0 9-17 * * SUN America/Los_Angeles'
AS
SELECT CURRENT_TIMESTAMP;

Additional timing examples:

SCHEDULE Value

Description

* * * * * UTC

Every minute. UTC time zone.

0 2 * * * UTC

Every night at 2 AM. UTC time zone.

0 5,17 * * * UTC

Twice daily, at 5 AM and 5 PM (at the top of the hour). UTC time zone.

30 2 L 6 * UTC

In June, on the last day of the month, at 2:30 AM. UTC time zone.

Create a serverless task that inserts the current timestamp into a table every hour. The task sets the TIMESTAMP_INPUT_FORMAT parameter for the session in which the task runs. This session parameter specifies the format of the inserted timestamp:

CREATE TASK t1
  SCHEDULE = '60 MINUTE'
  TIMESTAMP_INPUT_FORMAT = 'YYYY-MM-DD HH24'
  USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE = 'XSMALL'
AS
INSERT INTO mytable(ts) VALUES(CURRENT_TIMESTAMP);

Create a task that inserts the current timestamp into a table every 5 minutes:

CREATE TASK mytask_minute
  WAREHOUSE = mywh
  SCHEDULE = '5 MINUTE'
AS
INSERT INTO mytable(ts) VALUES(CURRENT_TIMESTAMP);

Create a task that inserts change tracking data for INSERT operations from a stream into a table every 5 minutes. The task polls the stream using the SYSTEM$STREAM_HAS_DATA function to determine whether change data exists and, if the result is FALSE, skips the current run:

CREATE TASK mytask1
  WAREHOUSE = mywh
  SCHEDULE = '5 minute'
WHEN
  SYSTEM$STREAM_HAS_DATA('MYSTREAM')
AS
  INSERT INTO mytable1(id,name) SELECT id, name FROM mystream WHERE METADATA$ACTION = 'INSERT';

Create a serverless child task in a DAG and add multiple predecessor tasks. The child task runs only after all specified predecessor tasks have successfully completed their own runs.

Suppose that the root task for a DAG is task1 and that task2, task3, and task4 are child tasks of task1. This example adds child task task5 to the DAG and specifies task2, task3, and task4 as predecessor tasks:

-- Create task5 and specify task2 as a predecessor task.
-- The new task is a serverless task that inserts the current timestamp into a table column.
CREATE TASK task5
  AFTER task2
AS
INSERT INTO t1(ts) VALUES(CURRENT_TIMESTAMP);

-- Specify additional predecessor tasks for the new task using ALTER TASK ... ADD AFTER.
ALTER TASK task5 ADD AFTER task3;
ALTER TASK task5 ADD AFTER task4;

Stored Procedure

Create a task named my_copy_task that calls a stored procedure to unload data from the mytable table to the named mystage stage (using COPY INTO <location>) every hour:

-- Create a stored procedure that unloads data from a table
-- The COPY statement in the stored procedure unloads data to files in a path identified by epoch time (using the Date.now() method)
create or replace procedure my_unload_sp()
  returns string not null
  language javascript
  as
  $$
    var my_sql_command = ""
    var my_sql_command = my_sql_command.concat("copy into @mystage","/",Date.now(),"/"," from mytable overwrite=true;");
    var statement1 = snowflake.createStatement( {sqlText: my_sql_command} );
    var result_set1 = statement1.execute();
  return my_sql_command; // Statement returned for info/debug purposes
  $$;

-- Create a task that calls the stored procedure every hour
create task my_copy_task
  warehouse = mywh
  schedule = '60 minute'
as
  call my_unload_sp();

Procedural Logic Using Snowflake Scripting

Create a serverless task that declares a variable, uses the variable, and returns the value of the variable every 2 minutes:

CREATE TASK t1
  USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE = 'XSMALL'
  SCHEDULE = '2 minute'
  AS
  DECLARE
    radius_of_circle float;
    area_of_circle float;
  BEGIN
    radius_of_circle := 3;
    area_of_circle := pi() * radius_of_circle * radius_of_circle;
    return area_of_circle;
  END;
Back to top