Categories:

Data Pipeline DDL

CREATE TASK

Creates a new task in the current/specified schema or replaces an existing task.

This command also supports the following variant:

  • CREATE TASK … CLONE (creates a clone of an existing task)

See also:

ALTER TASK , DROP TASK , SHOW TASKS

In this Topic:

Syntax

CREATE [ OR REPLACE ] TASK [ IF NOT EXISTS ] <name>
  [ WAREHOUSE = <string> ]
  [ SCHEDULE = '{ <num> MINUTE | USING CRON <expr> <time_zone> }' ]
  [ ALLOW_OVERLAPPING_EXECUTION = TRUE | FALSE ]
  [ <session_parameter> = <value> [ , <session_parameter> = <value> ... ] ]
  [ USER_TASK_TIMEOUT_MS = <num> ]
  [ USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE = <string> ]
  [ COPY GRANTS ]
  [ COMMENT = '<string_literal>' ]
  [ AFTER <string> ]
[ WHEN <boolean_expr> ]
AS
  <sql>

Variant Syntax

CREATE TASK … CLONE

Creates a new task with the same parameter values:

CREATE [ OR REPLACE ] TASK <name> CLONE <source_task>
  [ COPY GRANTS ]
  [ ... ]

For more details, see CREATE <object> … CLONE.

Required Parameters

name

String that specifies the identifier (i.e. name) for the task; must be unique for the schema in which the task is created.

In addition, the identifier must start with an alphabetic character and cannot contain spaces or special characters unless the entire identifier string is enclosed in double quotes (e.g. "My object"). Identifiers enclosed in double quotes are also case-sensitive.

For more details, see Identifier Requirements.

sql

Any single SQL statement, or a call to a stored procedure, executed when the task runs.

Note

The SQL statement or stored procedure must be executable on its own. We highly recommend that you verify the sql executes as expected before you create the task. Tasks are intended to automate SQL statements and stored procedures that have already been tested thoroughly.

Optional Parameters

WAREHOUSE = string

Specifies the virtual warehouse that is used to execute the task.

The following compute models are available to provide compute resources for runs of this task:

  • Specify a value for this parameter to provide user-managed compute resources (i.e. a warehouse) for runs of this task.

  • Omit this parameter altogether to use Snowflake-managed compute resources for runs of this task. Also referred to as the serverless compute model, these resources are automatically resized and scaled up or down by Snowflake as required for each workload. When a schedule is specified for a task, Snowflake adjusts the resource size to complete future runs of the task within the specified timeframe.

    To specify the initial warehouse size for the task, set the USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE = string parameter.

SCHEDULE ...

Specifies the schedule for periodically running the task:

Note

  • A schedule must be defined for a standalone task or the root task in a task tree, or the task will never run.

  • A schedule cannot be specified for child tasks in a task tree (i.e. tasks that have a predecessor task set using the AFTER parameter).

  • USING CRON expr time_zone

    Specifies a cron expression and time zone for periodically running the task. Supports a subset of standard cron utility syntax.

    For a list of time zones, see the list of tz database time zones (in Wikipedia).

    The cron expression consists of the following fields:

    # __________ minute (0-59)
    # | ________ hour (0-23)
    # | | ______ day of month (1-31, or L)
    # | | | ____ month (1-12, JAN-DEC)
    # | | | | _ day of week (0-6, SUN-SAT, or L)
    # | | | | |
    # | | | | |
      * * * * *
    

    The following special characters are supported:

    *

    Wildcard. Specifies any occurrence of the field.

    L

    Stands for “last”. When used in the day-of-week field, it allows you to specify constructs such as “the last Friday” (“5L”) of a given month. In the day-of-month field, it specifies the last day of the month.

    /n

    Indicates the nth instance of a given unit of time. Each quanta of time is computed independently. For example, if 4/3 is specified in the month field, then the task is scheduled for April, July and October (i.e. every 3 months, starting with the 4th month of the year). The same schedule is maintained in subsequent years. That is, the task is not scheduled to run in January (3 months after the October run).

    Note

    • The cron expression currently evaluates against the specified time zone only. Altering the TIMEZONE parameter value for the account (or setting the value at the user or session level) does not change the time zone for the task.

    • The cron expression defines all valid run times for the task. Snowflake attempts to run a task based on this schedule; however, any valid run time is skipped if a previous run has not completed before the next valid run time starts.

    • When both a specific day of month and day of week are included in the cron expression, then the task is scheduled on days satisfying either the day of month or day of week. For example, SCHEDULE = 'USING CRON 0 0 10-20 * TUE,THU UTC' schedules a task at 0AM on any 10th to 20th day of the month and also on any Tuesday or Thursday outside of those dates.

  • num MINUTE

    Specifies an interval (in minutes) of wait time inserted between runs of the task. Accepts positive integers only.

    Also supports num M syntax.

    To avoid ambiguity, a base interval time is set when:

    The base interval time starts the interval counter from the current clock time. For example, if an INTERVAL value of 10 is set and the task is enabled at 9:03 AM, then the task runs at 9:13 AM, 9:23 AM, and so on. Note that we make a best effort to ensure absolute precision, but only guarantee that tasks do not execute before their set interval occurs (e.g. in the current example, the task could first run at 9:14 AM, but will definitely not run at 9:12 AM).

    Note

    The maximum supported value is 11520 (8 days). Tasks that have a greater num MINUTE value never run.

ALLOW_OVERLAPPING_EXECUTION = TRUE | FALSE

Specifies whether to allow multiple instances of the task tree to run concurrently.

Note

This parameter can only be set on a root task. The setting applies to all tasks in the tree.

The parameter can be set on standalone tasks but does not affect the task behavior. Snowflake ensures only one instance of a standalone task is running at a given time.

  • TRUE ensures only one instance of a root task is running at a given time. If a root task is still running when the next scheduled run time occurs, then that scheduled time is skipped. This guarantee does not extend to child tasks. If the next scheduled run of the root task occurs while the current run of a child task is still in operation, another instance of the task tree begins.

  • FALSE ensures only one instance of a particular tree of tasks is allowed to run at a time. The next run of a root task is scheduled only after all child tasks in the tree have finished running. This means that if the cumulative time required to run all tasks in the tree exceeds the explicit scheduled time set in the definition of the root task, at least one run of the task tree is skipped.

Default: FALSE

session_parameter = value [ , session_parameter = value ... ]

Specifies a comma-separated list of session parameters to set for the session when the task runs. A task supports all session parameters. For the complete list, see Session Parameters.

USER_TASK_TIMEOUT_MS = num

Specifies the time limit on a single run of the task before it times out (in milliseconds).

Note

Before you increase the time limit on a task significantly, consider whether the SQL statement initiated by the task could be optimized (either by rewriting the statement or using a stored procedure) or the warehouse size should be increased.

Values: 0 - 86400000 (1 day).

Default: 3600000 (1 hour)

USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE = string

Applied only when using the serverless compute model for runs of this task.

Specifies the size of the compute resources to provision for the first run of the task, before a task history is available for Snowflake to determine an ideal size. Once a task has successfully completed a few runs, Snowflake ignores this parameter setting.

Note that if the task history is unavailable for a given task, the compute resources revert to this initial size.

Note

If a WAREHOUSE = string parameter value is specified, then setting this parameter produces a user error.

The size is equivalent to the compute resources available when creating a warehouse (using CREATE WAREHOUSE): SMALL, MEDIUM, LARGE, etc. The largest size supported by the parameter is XXLARGE. If the parameter is omitted, the first runs of the task are executed using a medium-sized (MEDIUM) warehouse.

You can change the initial size (using ALTER TASK) after the task is created but before it has run successfully once. Changing the parameter after the first run of this task starts has no effect on the compute resources for current or future task runs.

Note that suspending and resuming a task does not remove the task history used to size the compute resources. The task history is only removed if the task is recreated (using the CREATE OR REPLACE TASK syntax).

COPY GRANTS

Specifies to retain the access permissions from the original task when a new task is created using any of the following CREATE TASK variants:

  • CREATE OR REPLACE TASK

  • CREATE TASK … CLONE

The parameter copies all permissions, except OWNERSHIP, from the existing task to the new task. By default, the role that executes the CREATE TASK command owns the new task.

Note:

  • If the CREATE TASK statement references more than one task (e.g. create or replace task t1 clone t2;), the COPY GRANTS clause gives precedence to the task being replaced.

  • The SHOW GRANTS output for the replacement task lists the grantee for the copied privileges as the role that executed the CREATE TASK statement, with the current timestamp when the statement was executed.

  • The operation to copy grants occurs atomically in the CREATE TASK command (i.e. within the same transaction).

Note

This parameter is not supported currently.

COMMENT = 'string_literal'

Specifies a comment for the task.

Default: No value

AFTER string

Specifies the predecessor task for the current task. When a run of the predecessor task finishes successfully, it triggers this task (after a brief lag).

This parameter enables defining a simple tree of tasks; i.e. a set of tasks organized by their dependencies. In this context, a tree is a series of tasks that start with a scheduled root task and are linked together by their dependencies.

Note

  • The root task in the tree should have a defined schedule; each of the other tasks in the tree have a defined predecessor task (i.e. a task specified using the AFTER parameter) to link them together.

  • A task is limited to a single predecessor task; however, a task can have a maximum of 100 child tasks (i.e. other tasks that identify the task as a predecessor); in addition, a simple tree of tasks is limited to a maximum of 1000 tasks total (including the root task) in either a resumed or suspended state.

    For example, task T2 is limited to a single predecessor task, e.g. T1; however, T1 can serve as the predecessor task for tasks T2, T3, T4, etc.

  • All tasks in a simple tree must have the same task owner (i.e. a single role must have the OWNERSHIP privilege on all of the tasks in the tree).

  • All tasks in a simple tree must exist in the same schema.

  • The root task in a tree of tasks must be suspended before any task in the tree is recreated (using the CREATE OR REPLACE TASK syntax) or a child task is added (using CREATE TASK … AFTER).

  • If any task in a tree is cloned, the role that clones the task becomes the owner of the clone by default.

    • If the owner of the original task creates the clone, then the task clone retains the link between the task and the predecessor task. This means the same predecessor task triggers both the original task and the task clone.

    • If another role creates the clone, then the task clone can have a schedule but not a predecessor.

  • Accounts are currently limited to a maximum of 10000 resumed tasks.

  • Current limitations:

    • Snowflake guarantees that at most one instance of a task with a defined schedule is running at a given time; however, we cannot provide the same guarantee for tasks with a defined predecessor task.

WHEN boolean_expr

Specifies a Boolean SQL expression; multiple conditions joined with AND/OR are supported. When a task is triggered (based on its SCHEDULE or AFTER setting), it validates the conditions of the expression to determine whether to execute. If the conditions of the expression are not met, then the task skips the current run. Any tasks that identify this task as a predecessor also do not run.

Currently, only the following function is supported for evaluation in the SQL expression:

SYSTEM$STREAM_HAS_DATA

Indicates whether a specified stream contains change tracking data. Used to skip the current task run if the stream contains no change data.

If the result is FALSE, then the task does not run.

Validating the conditions of the WHEN expression does not require a virtual warehouse. The validation is instead processed in the cloud services layer. A nominal charge accrues each time a task evaluates its WHEN condition and does not run. The charges accumulate each time the task is triggered until it runs. At that time, the charge is converted to Snowflake credits and added to the compute resource usage for the task run.

Generally the compute time to validate the condition is insignificant compared to task execution time. As a best practice, align scheduled and actual task runs as closely as possible. Avoid task schedules that are wildly out of synch with actual task runs. For example, if data is inserted into a table with a stream roughly every 24 hours, do not schedule a task that checks for stream data every minute. The charge to validate the WHEN expression with each run is generally insignificant, but the charges are cumulative.

Note that daily consumption of cloud services that falls below the 10% quota of the daily usage of the compute resources accumulates no cloud services charges.

Usage Notes

  • The serverless compute model for tasks is supported as a preview feature.

  • If creating a task that relies on a virtual warehouse for its compute resources, the role that executes this command must have the USAGE privilege on the specified warehouse.

  • Tasks run using the role that has the OWNERSHIP privilege on the task. SQL statements executed by the task can only operate on Snowflake objects on which the role has the required privileges.

    We recommend that you execute a SQL statement or call a stored procedure before you include it in a task definition. Complete this step as the task owner role to ensure the role has all the required privileges on any objects referenced by the SQL.

  • The task owner (i.e. the role with the OWNERSHIP privilege on the task) must have the global EXECUTE TASK privilege in order for tasks to run. Revoking the EXECUTE TASK privilege on a role prevents all subsequent task runs from starting under that role.

    In addition, if the task relies on the serverless compute model for its compute resources, the task owner must have the global EXECUTE MANAGED TASK privilege.

  • The compute resources for individual runs of a task are either managed by Snowflake (i.e. the serverless compute model) or a user-specified virtual warehouse. To use the serverless compute model, omit the WAREHOUSE = string parameter in the CREATE TASK statement.

    • Snowflake-managed resources for a task can range from the equivalent of XSMALL to XXLARGE in warehouse sizes. If you would like to include larger warehouse sizes in this range, contact Snowflake Support to request a size increase.

    • The serverless model can be enabled for one or more tasks in a tree of tasks. Runs of individual tasks in a tree can rely on either Snowflake- or user-managed compute resources. Enabling this serverless compute model for all tasks in the tree is not required.

  • After creating a task, you must execute ALTER TASK … RESUME before the task will run based on the parameters specified in the task definition. Note that accounts are currently limited to a maximum of 10000 resumed tasks.

    In addition, when a task is cloned, execution of the cloned task is suspended by default and must be enabled explicitly using the same command.

  • By default, a DML statement executed without explicitly starting a transaction is automatically committed on success or rolled back on failure at the end of the statement. This behavior is called autocommit and is controlled with the AUTOCOMMIT parameter. This parameter must be set to TRUE. If the AUTOCOMMIT parameter is set to FALSE at the account level, then set the parameter to TRUE for the individual task (using ALTER TASK … SET AUTOCOMMIT = TRUE); otherwise, any DML statement executed by the task fails.

  • Multiple tasks that consume change data from a single table stream retrieve different deltas. When a task consumes the change data in a stream using a DML statement, the stream advances the offset. The change data is no longer available for the next task to consume. Currently, we recommend that only a single task consumes the change data from a stream. Multiple streams can be created for the same table and consumed by different tasks.

  • When the CREATE OR REPLACE syntax is used, the existing task is dropped and recreated using the specified definition. Any current run of the task (i.e. a run with an EXECUTING state in the TASK_HISTORY output) is completed. To abort the run of the specified task, execute the SYSTEM$USER_TASK_CANCEL_ONGOING_EXECUTIONS function.

Examples

Create a serverless task that inserts the current timestamp into a table every hour starting at 9 AM and ending at 5 PM on Sundays (America/Los_Angeles time zone). The initial warehouse size is XSMALL. The task sets the TIMESTAMP_INPUT_FORMAT parameter for the session in which the task runs:

CREATE TASK t1
  SCHEDULE = 'USING CRON 0 9-17 * * SUN America/Los_Angeles'
  TIMESTAMP_INPUT_FORMAT = 'YYYY-MM-DD HH24'
  USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE = 'XSMALL'
AS
INSERT INTO mytable(ts) VALUES(CURRENT_TIMESTAMP);

Same as the previous example, but the task relies on a user-managed warehouse to provide the compute resources for runs:

CREATE TASK mytask_hour
  WAREHOUSE = mywh
  SCHEDULE = 'USING CRON 0 9-17 * * SUN America/Los_Angeles'
  TIMESTAMP_INPUT_FORMAT = 'YYYY-MM-DD HH24'
AS
INSERT INTO mytable(ts) VALUES(CURRENT_TIMESTAMP);

Additional timing examples:

SCHEDULE Value

Description

* * * * * UTC

Every minute. UTC time zone.

0 2 * * * UTC

Every night at 2 AM. UTC time zone.

0 5,17 * * * UTC

Twice daily, at 5 AM and 5 PM (at the top of the hour). UTC time zone.

30 2 L 6 * UTC

In June, on the last day of the month, at 2:30 AM. UTC time zone.

Create a task that inserts the current timestamp into a table every 5 minutes:

CREATE TASK mytask_minute
  WAREHOUSE = mywh
  SCHEDULE = '5 MINUTE'
AS
INSERT INTO mytable(ts) VALUES(CURRENT_TIMESTAMP);

Create a task that inserts change tracking data for INSERT operations from a stream into a table every 5 minutes. The task polls the stream using the SYSTEM$STREAM_HAS_DATA function to determine whether change data exists and, if the result is FALSE, skips the current run:

CREATE TASK mytask1
  WAREHOUSE = mywh
  SCHEDULE = '5 minute'
WHEN
  SYSTEM$STREAM_HAS_DATA('MYSTREAM')
AS
  INSERT INTO mytable1(id,name) SELECT id, name FROM mystream WHERE METADATA$ACTION = 'INSERT';

Create a simple tree of tasks by specifying the existing mytask1 task as the predecessor task that triggers the new mytask2 task when run successfully. The new task queries the mytable table and inserts the query results into another table:

CREATE TASK mytask2
  WAREHOUSE = mywh
  AFTER mytask1
AS
INSERT INTO mytable2(id,name) SELECT id, name FROM mytable1;

Create a task named my_copy_task that calls a stored procedure to unload data from the mytable table to the named mystage stage (using COPY INTO <location>) every hour:

-- Create a stored procedure that unloads data from a table
-- The COPY statement in the stored procedure unloads data to files in a path identified by epoch time (using the Date.now() method)
create or replace procedure my_unload_sp()
  returns string not null
  language javascript
  as
  $$
    var my_sql_command = ""
    var my_sql_command = my_sql_command.concat("copy into @mystage","/",Date.now(),"/"," from mytable overwrite=true;");
    var statement1 = snowflake.createStatement( {sqlText: my_sql_command} );
    var result_set1 = statement1.execute();
  return my_sql_command; // Statement returned for info/debug purposes
  $$;

-- Create a task that calls the stored procedure every hour
create task my_copy_task
  warehouse = mywh
  schedule = '60 minute'
as
  call my_unload_sp();