Categories:

Data Pipeline DDL

CREATE TASK

Creates a new task in the current/specified schema or replaces an existing task.

This command also supports the following variant:

  • CREATE TASK … CLONE (creates a clone of an existing task)

See also:

ALTER TASK , DROP TASK , SHOW TASKS , DESCRIBE TASK

In this Topic:

Syntax

CREATE [ OR REPLACE ] TASK [ IF NOT EXISTS ] <name>
  [ { WAREHOUSE = <string> } | { USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE = <string> } ]
  [ SCHEDULE = '{ <num> MINUTE | USING CRON <expr> <time_zone> }' ]
  [ ALLOW_OVERLAPPING_EXECUTION = TRUE | FALSE ]
  [ <session_parameter> = <value> [ , <session_parameter> = <value> ... ] ]
  [ USER_TASK_TIMEOUT_MS = <num> ]
  [ SUSPEND_TASK_AFTER_NUM_FAILURES = <num> ]
  [ ERROR_INTEGRATION = <integration_name> ]
  [ COPY GRANTS ]
  [ COMMENT = '<string_literal>' ]
  [ AFTER <string> [ , <string> , ... ] ]
[ WHEN <boolean_expr> ]
AS
  <sql>

Variant Syntax

CREATE TASK … CLONE

Creates a new task with the same parameter values:

CREATE [ OR REPLACE ] TASK <name> CLONE <source_task>
  [ COPY GRANTS ]
  [ ... ]

For more details, see CREATE <object> … CLONE.

Required Parameters

name

String that specifies the identifier (i.e. name) for the task; must be unique for the schema in which the task is created.

In addition, the identifier must start with an alphabetic character and cannot contain spaces or special characters unless the entire identifier string is enclosed in double quotes (e.g. "My object"). Identifiers enclosed in double quotes are also case-sensitive.

For more details, see Identifier Requirements.

sql

Any one of the following:

  • Single SQL statement

  • Call to a stored procedure

  • Procedural logic using Snowflake Scripting

    Note that currently, Snowsight and the classic web interface do not support creating or modifying tasks to use Snowflake Scripting. Instead, use SnowSQL or another command-line client.

The SQL code is executed when the task runs.

Note

  • The SQL code must be executable on its own. We highly recommend that you verify the sql executes as expected before you create the task. Tasks are intended to automate SQL statements and stored procedures that have already been tested thoroughly.

  • Serverless tasks cannot invoke the following object types:

    • UDFs (user-defined functions) that contain Java or Python code.

    • Stored procedures written in Scala (using Snowpark), or which call UDFs that contain Java or Python code.

Optional Parameters

WAREHOUSE = string or . USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE = string

WAREHOUSE = string

Specifies the virtual warehouse that that provides compute resources for task runs.

Omit this parameter to use Snowflake-managed compute resources for runs of this task. Also referred to as serverless tasks, these resources are automatically resized and scaled up or down by Snowflake as required for each workload. When a schedule is specified for a task, Snowflake adjusts the resource size to complete future runs of the task within the specified time frame. To specify the initial warehouse size for the task, set the USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE = string parameter.

USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE = string

Applied only to serverless tasks.

Specifies the size of the compute resources to provision for the first run of the task, before a task history is available for Snowflake to determine an ideal size. Once a task has successfully completed a few runs, Snowflake ignores this parameter setting.

Note that if the task history is unavailable for a given task, the compute resources revert to this initial size.

Note

If a WAREHOUSE = string parameter value is specified, then setting this parameter produces a user error.

The size is equivalent to the compute resources available when creating a warehouse (using CREATE WAREHOUSE): SMALL, MEDIUM, LARGE, etc. The largest size supported by the parameter is XXLARGE. If the parameter is omitted, the first runs of the task are executed using a medium-sized (MEDIUM) warehouse.

You can change the initial size (using ALTER TASK) after the task is created but before it has run successfully once. Changing the parameter after the first run of this task starts has no effect on the compute resources for current or future task runs.

Note that suspending and resuming a task does not remove the task history used to size the compute resources. The task history is only removed if the task is recreated (using the CREATE OR REPLACE TASK syntax).

For more information about this parameter, see USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE.

SCHEDULE ...

Specifies the schedule for periodically running the task:

Note

  • A schedule must be defined for a standalone task or the root task in a DAG of tasks; otherwise, the task only runs if manually executed using EXECUTE TASK.

  • A schedule cannot be specified for child tasks in a DAG.

  • USING CRON expr time_zone

    Specifies a cron expression and time zone for periodically running the task. Supports a subset of standard cron utility syntax.

    For a list of time zones, see the list of tz database time zones (in Wikipedia).

    The cron expression consists of the following fields:

    # __________ minute (0-59)
    # | ________ hour (0-23)
    # | | ______ day of month (1-31, or L)
    # | | | ____ month (1-12, JAN-DEC)
    # | | | | _ day of week (0-6, SUN-SAT, or L)
    # | | | | |
    # | | | | |
      * * * * *
    

    The following special characters are supported:

    *

    Wildcard. Specifies any occurrence of the field.

    L

    Stands for “last”. When used in the day-of-week field, it allows you to specify constructs such as “the last Friday” (“5L”) of a given month. In the day-of-month field, it specifies the last day of the month.

    /n

    Indicates the nth instance of a given unit of time. Each quanta of time is computed independently. For example, if 4/3 is specified in the month field, then the task is scheduled for April, July and October (i.e. every 3 months, starting with the 4th month of the year). The same schedule is maintained in subsequent years. That is, the task is not scheduled to run in January (3 months after the October run).

    Note

    • The cron expression currently evaluates against the specified time zone only. Altering the TIMEZONE parameter value for the account (or setting the value at the user or session level) does not change the time zone for the task.

    • The cron expression defines all valid run times for the task. Snowflake attempts to run a task based on this schedule; however, any valid run time is skipped if a previous run has not completed before the next valid run time starts.

    • When both a specific day of month and day of week are included in the cron expression, then the task is scheduled on days satisfying either the day of month or day of week. For example, SCHEDULE = 'USING CRON 0 0 10-20 * TUE,THU UTC' schedules a task at 0AM on any 10th to 20th day of the month and also on any Tuesday or Thursday outside of those dates.

    • The shortest granularity of time in cron is minutes. If a task is resumed during the minute defined in its cron expression, the first scheduled run of the task is the next occurrence of the instance of the cron expression. For example, if task scheduled to run daily at midnight (USING CRON 0 0 * * *) is resumed at midnight plus 5 seconds (00:00:05), the first task run is scheduled for the following midnight.

  • num MINUTE

    Specifies an interval (in minutes) of wait time inserted between runs of the task. Accepts positive integers only.

    Also supports num M syntax.

    To avoid ambiguity, a base interval time is set when:

    The base interval time starts the interval counter from the current clock time. For example, if an INTERVAL value of 10 is set and the task is enabled at 9:03 AM, then the task runs at 9:13 AM, 9:23 AM, and so on. Note that we make a best effort to ensure absolute precision, but only guarantee that tasks do not execute before their set interval occurs (e.g. in the current example, the task could first run at 9:14 AM, but will definitely not run at 9:12 AM).

    Note

    The maximum supported value is 11520 (8 days). Tasks that have a greater num MINUTE value never run.

ALLOW_OVERLAPPING_EXECUTION = TRUE | FALSE

Specifies whether to allow multiple instances of the DAG to run concurrently.

Note

This parameter can only be set on a root task. The setting applies to all tasks in the DAG.

The parameter can be set on standalone tasks but does not affect the task behavior. Snowflake ensures only one instance of a standalone task is running at a given time.

  • TRUE ensures only one instance of a root task is running at a given time. If a root task is still running when the next scheduled run time occurs, then that scheduled time is skipped. This guarantee does not extend to child tasks. If the next scheduled run of the root task occurs while the current run of a child task is still in operation, another instance of the DAG begins.

  • FALSE ensures only one instance of a particular DAG is allowed to run at a time. The next run of a root task is scheduled only after all child tasks in the DAG have finished running. This means that if the cumulative time required to run all tasks in the DAG exceeds the explicit scheduled time set in the definition of the root task, at least one run of the DAG is skipped.

Default: FALSE

session_parameter = value [ , session_parameter = value ... ]

Specifies a comma-separated list of session parameters to set for the session when the task runs. A task supports all session parameters. For the complete list, see Session Parameters.

USER_TASK_TIMEOUT_MS = num

Specifies the time limit on a single run of the task before it times out (in milliseconds).

Note

  • Before you increase the time limit on a task significantly, consider whether the SQL statement initiated by the task could be optimized (either by rewriting the statement or using a stored procedure) or the warehouse size should be increased.

  • In some situations, the parameter STATEMENT_TIMEOUT_IN_SECONDS has higher precedence than USER_TASK_TIMEOUT_MS. For details, see STATEMENT_TIMEOUT_IN_SECONDS.

For more information about this parameter, see USER_TASK_TIMEOUT_MS.

Values: 0 - 86400000 (1 day).

Default: 3600000 (1 hour)

SUSPEND_TASK_AFTER_NUM_FAILURES = num

Specifies the number of consecutive failed task runs after which the current task is suspended automatically. Failed task runs include runs in which the SQL code in the task body either produces a user error or times out. Task runs that are skipped, canceled, or that fail due to a system error are considered indeterminate and are not included in the count of failed task runs.

Set the parameter on a standalone task or the root task in a DAG. When the parameter is set to a value greater than 0, the following behavior applies to runs of the standalone task or DAG:

  • Standalone tasks are automatically suspended after the specified number of consecutive task runs either fail or time out.

  • The root task is automatically suspended after the run of any single task in a DAG fails or times out the specified number of times in consecutive runs.

The setting applies to tasks that rely on either Snowflake-managed compute resources (i.e. serverless compute model) or user-managed compute resources (i.e. a virtual warehouse).

For more information about this parameter, see SUSPEND_TASK_AFTER_NUM_FAILURES.

Values: 0 - No upper limit.

Default: 0 (no automatic suspension)

ERROR_INTEGRATION = 'integration_name'

Required only when configuring a task to send error notifications using Amazon Simple Notification Service (SNS). Specifies the name of the notification integration used to communicate with Amazon SNS. For more information, see Enabling Error Notifications for Tasks.

COPY GRANTS

Specifies to retain the access permissions from the original task when a new task is created using any of the following CREATE TASK variants:

  • CREATE OR REPLACE TASK

  • CREATE TASK … CLONE

The parameter copies all permissions, except OWNERSHIP, from the existing task to the new task. By default, the role that executes the CREATE TASK command owns the new task.

Note:

  • If the CREATE TASK statement references more than one task (e.g. create or replace task t1 clone t2;), the COPY GRANTS clause gives precedence to the task being replaced.

  • The SHOW GRANTS output for the replacement task lists the grantee for the copied privileges as the role that executed the CREATE TASK statement, with the current timestamp when the statement was executed.

  • The operation to copy grants occurs atomically in the CREATE TASK command (i.e. within the same transaction).

Note

This parameter is not supported currently.

COMMENT = 'string_literal'

Specifies a comment for the task.

Default: No value

AFTER string [ , string , ... ]

Specifies one or more predecessor tasks for the current task. Use this option to create a DAG of tasks or add this task to an existing DAG. A DAG is a series of tasks that starts with a scheduled root task and is linked together by dependencies.

Note that the structure of a DAG can be defined after all of its component tasks are created. Execute ALTER TASK … ADD AFTER statements to specify the predecessors for each task in the planned DAG.

A task runs after all of its predecessor tasks have finished their own runs successfully (after a brief lag).

Note

  • The root task should have a defined schedule. Each child task must one or more defined predecessor tasks (i.e. tasks specified using the AFTER parameter) to link the tasks together.

  • A single task is limited to 100 predecessor tasks and 100 child tasks. In addition, a DAG is limited to a maximum of 1000 tasks total (including the root task) in either a resumed or suspended state.

  • Accounts are currently limited to a maximum of 10000 resumed tasks.

  • All tasks in a DAG must have the same task owner (i.e. a single role must have the OWNERSHIP privilege on all of the tasks in the DAG).

  • All tasks in a DAG must exist in the same schema.

  • The root task must be suspended before any task is recreated (using the CREATE OR REPLACE TASK syntax) or a child task is added (using CREATE TASK … AFTER or ALTER TASK … ADD AFTER) or removed (using ALTER TASK … REMOVE AFTER).

  • If any task in a DAG is cloned, the role that clones the task becomes the owner of the clone by default.

    • If the owner of the original task creates the clone, then the task clone retains the link between the task and the predecessor task. This means the same predecessor task triggers both the original task and the task clone.

    • If another role creates the clone, then the task clone can have a schedule but not a predecessor.

  • Current limitations:

    • Snowflake guarantees that at most one instance of a task with a defined schedule is running at a given time; however, we cannot provide the same guarantee for tasks with a defined predecessor task.

WHEN boolean_expr

Specifies a Boolean SQL expression; multiple conditions joined with AND/OR are supported. When a task is triggered (based on its SCHEDULE or AFTER setting), it validates the conditions of the expression to determine whether to execute. If the conditions of the expression are not met, then the task skips the current run. Any tasks that identify this task as a predecessor also do not run.

SYSTEM$STREAM_HAS_DATA is the only function supported for evaluation in the SQL expression. This function indicates whether a specified stream contains change tracking data. A task evaluates whether the specified stream contains change data before starting the current run. If the result is FALSE, then the task does not run. Note that this function is designed to avoid false negatives (i.e. returning a false value even when the stream contains change data); however, the function is not guaranteed to avoid false positives (i.e. returning a true value when the stream contains no change data).

Validating the conditions of the WHEN expression does not require compute resources. The validation is instead processed in the cloud services layer. A nominal charge accrues each time a task evaluates its WHEN condition and does not run. The charges accumulate each time the task is triggered until it runs. At that time, the charge is converted to Snowflake credits and added to the compute resource usage for the task run.

Generally the compute time to validate the condition is insignificant compared to task execution time. As a best practice, align scheduled and actual task runs as closely as possible. Avoid task schedules that are wildly out of synch with actual task runs. For example, if data is inserted into a table with a stream roughly every 24 hours, do not schedule a task that checks for stream data every minute. The charge to validate the WHEN expression with each run is generally insignificant, but the charges are cumulative.

Note that daily consumption of cloud services that falls below the 10% quota of the daily usage of the compute resources accumulates no cloud services charges.

Access Control Requirements

A role used to execute this SQL command must have the following privileges at a minimum:

Privilege

Object

Notes

EXECUTE MANAGED TASK

Account

Required only for tasks that rely on Snowflake-managed compute resources for runs (serverless tasks).

CREATE TASK

Schema

USAGE

Warehouse

Required only for tasks that rely on user-managed warehouses for runs.

Note that operating on any object in a schema also requires the USAGE privilege on the parent database and schema.

For instructions on creating a custom role with a specified set of privileges, see Creating Custom Roles.

For general information about roles and privilege grants for performing SQL actions on securable objects, see Access Control in Snowflake.

Usage Notes

  • Tasks run using the privileges granted to the task owner (i.e. the role that has the OWNERSHIP privilege on the task). For the list of minimum required privileges to run tasks, see Task Security.

    We recommend that you execute a SQL statement or call a stored procedure before you include it in a task definition. Complete this step as the task owner role to ensure the role has all the required privileges on any objects referenced by the SQL.

  • The compute resources for individual runs of a task are either managed by Snowflake (i.e. serverless tasks) or a user-specified virtual warehouse. To use serverless tasks, omit the WAREHOUSE = string parameter in the CREATE TASK statement.

    • Snowflake-managed resources for a task can range from the equivalent of XSMALL to XXLARGE in warehouse sizes. If you would like to include larger warehouse sizes in this range, contact Snowflake Support to request a size increase.

    • The serverless model can be enabled for one or more tasks in a DAG. Runs of individual tasks in a DAG can rely on either Snowflake- or user-managed compute resources. Enabling this serverless compute model for all tasks in the DAG is not required.

  • After creating a task, you must execute ALTER TASK … RESUME before the task will run based on the parameters specified in the task definition. Note that accounts are currently limited to a maximum of 10000 resumed tasks.

    In addition, when a task is cloned, execution of the cloned task is suspended by default and must be enabled explicitly using the same command.

  • By default, a DML statement executed without explicitly starting a transaction is automatically committed on success or rolled back on failure at the end of the statement. This behavior is called autocommit and is controlled with the AUTOCOMMIT parameter. This parameter must be set to TRUE. If the AUTOCOMMIT parameter is set to FALSE at the account level, then set the parameter to TRUE for the individual task (using ALTER TASK … SET AUTOCOMMIT = TRUE); otherwise, any DML statement executed by the task fails.

  • Multiple tasks that consume change data from a single table stream retrieve different deltas. When a task consumes the change data in a stream using a DML statement, the stream advances the offset. The change data is no longer available for the next task to consume. Currently, we recommend that only a single task consumes the change data from a stream. Multiple streams can be created for the same table and consumed by different tasks.

  • When the CREATE OR REPLACE syntax is used, the existing task is dropped and recreated using the specified definition. Note the following behaviors:

    • The recreated task is suspended by default.

    • Any current run of the task (i.e. a run with an EXECUTING state in the TASK_HISTORY output) is completed. To abort the run of the specified task, execute the SYSTEM$USER_TASK_CANCEL_ONGOING_EXECUTIONS function.

    • If a standalone or root task is recreated, the next scheduled run of the task is cancelled.

    • CREATE OR REPLACE <object> statements are atomic. That is, when the object is replaced, the old object deletion and the new object creation are processed in a single transaction.

  • Regarding metadata:

    Attention

    Customers should ensure that no personal data (other than for a User object), sensitive data, export-controlled data, or other regulated data is entered as metadata when using the Snowflake service. For more information, see Metadata Fields in Snowflake.

Examples

Single SQL Statement

Create a serverless task that queries the current timestamp every hour starting at 9 AM and ending at 5 PM on Sundays (America/Los_Angeles time zone).

The initial warehouse size is XSMALL:

CREATE TASK t1
  SCHEDULE = 'USING CRON 0 9-17 * * SUN America/Los_Angeles'
  USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE = 'XSMALL'
AS
SELECT CURRENT_TIMESTAMP;

Same as the previous example, but the task relies on a user-managed warehouse to provide the compute resources for runs:

CREATE TASK mytask_hour
  WAREHOUSE = mywh
  SCHEDULE = 'USING CRON 0 9-17 * * SUN America/Los_Angeles'
AS
SELECT CURRENT_TIMESTAMP;

Additional timing examples:

SCHEDULE Value

Description

* * * * * UTC

Every minute. UTC time zone.

0 2 * * * UTC

Every night at 2 AM. UTC time zone.

0 5,17 * * * UTC

Twice daily, at 5 AM and 5 PM (at the top of the hour). UTC time zone.

30 2 L 6 * UTC

In June, on the last day of the month, at 2:30 AM. UTC time zone.

Create a serverless task that inserts the current timestamp into a table every hour. The task sets the TIMESTAMP_INPUT_FORMAT parameter for the session in which the task runs. This session parameter specifies the format of the inserted timestamp:

CREATE TASK t1
  SCHEDULE = '60 MINUTE'
  TIMESTAMP_INPUT_FORMAT = 'YYYY-MM-DD HH24'
  USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE = 'XSMALL'
AS
INSERT INTO mytable(ts) VALUES(CURRENT_TIMESTAMP);

Create a task that inserts the current timestamp into a table every 5 minutes:

CREATE TASK mytask_minute
  WAREHOUSE = mywh
  SCHEDULE = '5 MINUTE'
AS
INSERT INTO mytable(ts) VALUES(CURRENT_TIMESTAMP);

Create a task that inserts change tracking data for INSERT operations from a stream into a table every 5 minutes. The task polls the stream using the SYSTEM$STREAM_HAS_DATA function to determine whether change data exists and, if the result is FALSE, skips the current run:

CREATE TASK mytask1
  WAREHOUSE = mywh
  SCHEDULE = '5 minute'
WHEN
  SYSTEM$STREAM_HAS_DATA('MYSTREAM')
AS
  INSERT INTO mytable1(id,name) SELECT id, name FROM mystream WHERE METADATA$ACTION = 'INSERT';

Create a serverless child task in a DAG and add multiple predecessor tasks. The child task runs only after all specified predecessor tasks have successfully completed their own runs.

Suppose that the root task for a DAG is task1 and that task2, task3, and task4 are child tasks of task1. This example adds child task task5 to the DAG and specifies task2, task3, and task4 as predecessor tasks:

-- Create task5 and specify task2, task3, task4 as predecessors tasks.
-- The new task is a serverless task that inserts the current timestamp into a table column.
CREATE TASK task5
  AFTER task2, task3, task4
AS
INSERT INTO t1(ts) VALUES(CURRENT_TIMESTAMP);

Stored Procedure

Create a task named my_copy_task that calls a stored procedure to unload data from the mytable table to the named mystage stage (using COPY INTO <location>) every hour:

-- Create a stored procedure that unloads data from a table
-- The COPY statement in the stored procedure unloads data to files in a path identified by epoch time (using the Date.now() method)
create or replace procedure my_unload_sp()
  returns string not null
  language javascript
  as
  $$
    var my_sql_command = ""
    var my_sql_command = my_sql_command.concat("copy into @mystage","/",Date.now(),"/"," from mytable overwrite=true;");
    var statement1 = snowflake.createStatement( {sqlText: my_sql_command} );
    var result_set1 = statement1.execute();
  return my_sql_command; // Statement returned for info/debug purposes
  $$;

-- Create a task that calls the stored procedure every hour
create task my_copy_task
  warehouse = mywh
  schedule = '60 minute'
as
  call my_unload_sp();

Multiple SQL Statements Using SnowSQL

Create a task that executes multiple SQL statements. In this example, the task modifies the TIMESTAMP_OUTPUT_FORMAT for the session and then queries the CURRENT_TIMESTAMP function.

Note

The SQL code in the task definition includes multiple statements. To execute the CREATE TASK statement, you must temporarily set a character other than a semicolon as the delimiter for SQL statements; otherwise, the CREATE TASK statement would return a user error. The command to change the SQL delimiter in SnowSQL is !set sql_delimiter = <character>.

!set sql_delimiter=/
CREATE OR REPLACE TASK test_logging
  USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE = 'XSMALL'
  SCHEDULE = 'USING CRON  0 * * * * America/Los_Angeles'
AS
BEGIN
  ALTER SESSION SET TIMESTAMP_OUTPUT_FORMAT = 'YYYY-MM-DD HH24:MI:SS.FF';
  SELECT CURRENT_TIMESTAMP;
END;/
!set sql_delimiter=";"

Procedural Logic Using Snowflake Scripting

Create a task that declares a variable, uses the variable, and returns the value of the variable every 2 minutes:

CREATE TASK t1
  USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE = 'XSMALL'
  SCHEDULE = '2 minute'
  AS
  EXECUTE IMMEDIATE
  $$
  DECLARE
    radius_of_circle float;
    area_of_circle float;
  BEGIN
    radius_of_circle := 3;
    area_of_circle := pi() * radius_of_circle * radius_of_circle;
    return area_of_circle;
  END;
  $$;
Back to top