Using Sequences

Sequences are used to generate unique numbers across sessions and statements, including concurrent statements. They can be used to generate values for a primary key or any column that requires a unique value.

Important

Snowflake does not guarantee generating sequence numbers without gaps. The generated numbers are not necessarily contiguous.

Sequence Semantics

Snowflake sequences currently utilize the following semantics:

  • All values generated by a sequence are globally unique as long as the sign of the sequence interval does not change (e.g. by changing the step size). Concurrent queries never observe the same value, and values within a single query are always distinct.

  • Changing the sequence interval from positive to negative (e.g. from 1 to -1), or vice versa may result in duplicates. For example, if the first query(s) return sequence values 1, 2, and 3, and if the interval is changed from 1 to -1, then the next few values generated include 2, and 1, which were generated previously.

  • Snowflake may calculate the next value for a sequence as soon as the current sequence number is used, rather than waiting until the next sequence number is requested.

    A consequence of this is that an ALTER SEQUENCE ... SET INCREMENT ... command might not affect the next operation that uses the sequence. For an example, see Understanding the Effects of Reversing the Direction of a Sequence.

  • Each generated sequence value additionally reserves values depending on the sequence interval, also called the “step”. The reserved values span from the sequence value to

    <value>  +  (sign(<step>) * abs(<step>))  -  (sign(<step>) * 1)

    (inclusive).

    Thus, if the value 100 is generated:

    • With a step of 2, values 100 and 101 are reserved.

    • With a step of 10, values 100 to 109 are reserved.

    • With a step of -5, values 96 to 100 are reserved.

    A reserved value is never generated by the sequence as long as the step/interval is never modified.

  • Values generated by a sequence are greater than the maximum value produced by a previous statement (or less than the minimum value if the step size is negative) if the following are true:

    • The sequence does not have the NOORDER property.

      NOORDER specifies that the values are not guaranteed to be in increasing order.

      For example, if a sequence has START 1 INCREMENT 2, the generated values might be 1, 3, 101, 5, 103, etc.

      NOORDER can improve performance when multiple insert operations need to be performed concurrently (for example, when multiple clients are executing multiple INSERT statements).

    • The previous statement completed, and an acknowledgment was received, prior to submitting the current statement.

    This behavior does not hold if the sign of the interval is changed (positive to negative or negative to positive).

There is no guarantee that values from a sequence are contiguous (gap-free) or that the sequence values are assigned in a particular order. There is, in fact, no way to assign values from a sequence to rows in a specified order other than to use single-row statements (this still provides no guarantee about gaps).

A sequence value can represent a 64-bit two’s complement integer (-2^63 to 2^63 - 1). If the internal representation of a sequence’s next value exceeds this range (in either direction) an error results and the query fails. Note that this may result in losing these sequence values.

In this situation, you must either use a smaller (in magnitude) increment value or create a new sequence with a smaller start value. As gaps may occur, the internal representation of the next value may exceed the allowable range even if the returned sequence values are all within the allowable range. Snowflake does not provide an explicit guarantee regarding how to avoid this error, but Snowflake supports sequence objects that correctly provide unique values. A sequence object created with a start value of 1 and an increment value of 1 is extremely unlikely to exhaust the allowable range of sequence values.

Referencing Sequences

currval Not Supported

Many databases provide a currval sequence reference; however, Snowflake does not. currval in other systems is typically used to create primary-foreign key relationships between tables — a first statement inserts a single row into the fact table using a sequence to create a key. Subsequent statements insert rows into the dimension tables using currval to refer to the fact table’s key.

This pattern is contrary to Snowflake best practices — bulk queries should be preferred over small, single-row queries. The same task can be better accomplished using multi-table INSERT and sequence references in nested subqueries. For a detailed example, see Ingesting and Normalizing Denormalized Data (in this topic).

Sequences as Expressions

Sequences may be accessed in queries as expressions of the form seq_name.NEXTVAL. Each occurrence of a sequence generates a set of distinct values. This is different from what many other databases provide, where multiple references to NEXTVAL of a sequence return the same value for each row.

For example, the following query returns distinct values for columns a and b:

CREATE OR REPLACE SEQUENCE seq1;

SELECT seq1.NEXTVAL a, seq1.NEXTVAL b FROM DUAL;
Copy

To return two columns with the same generated sequence value, use nested subqueries and views:

CREATE OR REPLACE SEQUENCE seq1;

SELECT seqRef.a a, seqRef.a b FROM (SELECT seq1.NEXTVAL a FROM DUAL) seqRef;
Copy

Nested subqueries generate as many distinct sequence values as rows returned by the subquery (so a sequence reference in a query block with several joins refers not to any of the joined objects, but the output of the query block). These generated values may not be observed if the associated rows are later filtered out, or the values may be observed twice (as in the above example) if the sequence column or the inline view are referred to multiple times.

Note

For multi-table insert, insert values may be provided both in the VALUES clauses and in the SELECT input:

  • VALUES clauses referring to a sequence value aliased from the input SELECT receive the same value.

  • VALUES clauses containing a direct reference to a sequence NEXTVAL receive distinct values.

In contrast, Oracle restricts sequence references to VALUES clauses only.

Sequences as Table Functions

Nested queries with sequence references are often difficult to understand and verbose — any shared reference (where two columns of a row should receive the same sequence value) requires an additional level of query nesting. To simplify nested-query syntax, Snowflake provides an additional method to generate sequences using the table function GETNEXTVAL, as in the following example:

CREATE OR REPLACE SEQUENCE seq1;

CREATE OR REPLACE TABLE foo (n NUMBER);

INSERT INTO foo VALUES (100), (101), (102);

SELECT n, s.nextval FROM foo, TABLE(GETNEXTVAL(seq1)) s;
Copy

GETNEXTVAL is a special 1-row table function that generates a unique value (and joins this value) to other objects in the SELECT statement. A call to GETNEXTVAL must be aliased; otherwise, the generated values cannot be referenced. Multiple columns may refer to a generated value by accessing this alias. The GETNEXTVAL alias contains an attribute also named NEXTVAL.

The GETNEXTVAL table function additionally allows precise control over sequence generation when many tables are joined together. The order of objects in the FROM clause determines where values are generated. Sequence values are generated over the result of joins between all objects listed prior to GETNEXTVAL in the FROM clause. The resulting rows are then joined to the objects to the right. There is an implicit lateral dependence between GETNEXTVAL and all other objects in the FROM clause. Joins may not reorder around GETNEXTVAL. This is an exception in SQL, as typically the order of objects does not affect the query semantics.

Consider the following example with tables t1, t2, t3, and t4:

CREATE OR REPLACE SEQUENCE seq1;

SELECT t1.*, t2.*, t3.*, t4.*, s.NEXTVAL FROM t1, t2, TABLE(GETNEXTVAL(seq1)) s, t3, t4;
Copy

This query will join t1 to t2, generate a unique value of the result, and then join the resulting relation against t3 and t4. The order of joins between the post-sequence relation, t3, and t4 is not specified because inner joins are associative.

Note

These semantics can be tricky. We recommend using GETNEXTVAL at the end of the FROM clause, when possible and appropriate, to avoid confusion.

Using Sequences to Create Default Column Values

Sequences can be used in tables to generate primary keys for table columns. The following tools provide a simple way to do this.

Column Default Expressions

The column default expression can be a sequence reference. Omitting the column in an insert statement or setting the value to DEFAULT in an insert or update statement will generate a new sequence value for the row.

For example:

CREATE OR REPLACE SEQUENCE seq1;

CREATE OR REPLACE TABLE foo (k NUMBER DEFAULT seq1.NEXTVAL, v NUMBER);

-- insert rows with unique keys (generated by seq1) and explicit values
INSERT INTO foo (v) VALUES (100);
INSERT INTO foo VALUES (DEFAULT, 101);

-- insert rows with unique keys (generated by seq1) and reused values.
-- new keys are distinct from preexisting keys.
INSERT INTO foo (v) SELECT v FROM foo;

-- insert row with explicit values for both columns
INSERT INTO foo VALUES (1000, 1001);

SELECT * FROM foo;

+------+------+
|    K |    V |
|------+------|
|    1 |  100 |
|    2 |  101 |
|    3 |  100 |
|    4 |  101 |
| 1000 | 1001 |
+------+------+
Copy

The advantage of using sequences as a column default value is that the sequence can be referenced in other locations, and even be the default value for multiple columns and in multiple tables. If a sequence is named as the default expression of a column and then subsequently dropped any attempt to insert/update the table using the default value will result in an error saying the identifier cannot be found.

Ingesting and Normalizing Denormalized Data

Consider a schema with two tables, people and contact:

  • The people table contains:

    • A primary key unique identifier: id

    • Two string columns: firstName and lastName

  • The contact table contains:

    • A primary key unique identifier: id

    • A foreign key linking this contact entry to a person: p_id

    • Two string columns:

      • c_type: The type of contact (e.g. ‘email’ or ‘phone’).

      • data: The actual contact information.

Data in this format frequently is denormalized for ingestion or while processing semi-structured data.

This example illustrates ingesting JSON data, denormalizing it to extract the desired data, and normalizing the data as it is inserted into tables. At the same time, it is important to create unique identifiers on rows while maintaining the intended relationships across rows of tables. We accomplish this with sequences.

  1. First, we set up the tables and sequences used in the example:

    -- primary data tables
    
    CREATE OR REPLACE TABLE people (id number, firstName string, lastName string);
    CREATE OR REPLACE TABLE contact (id number, p_id number, c_type string, data string);
    
    -- sequences to produce primary keys on our data tables
    
    CREATE OR REPLACE SEQUENCE people_seq;
    CREATE OR REPLACE SEQUENCE contact_seq;
    
    -- staging table for json
    
    CREATE OR REPLACE TABLE input (json variant);
    
    Copy
  2. Next, we insert data from table json:

    INSERT INTO input SELECT parse_json(
    '[
     {
       firstName : \'John\',
       lastName : \'Doe\',
       contacts : [
         {
           contactType : \'phone\',
           contactData : \'1234567890\',
         }
         ,
         {
           contactType : \'email\',
           contactData : \'jdoe@acme.com\',
         }
        ]
       }
    ,
      {
       firstName : \'Mister\',
       lastName : \'Smith\',
       contacts : [
         {
           contactType : \'phone\',
           contactData : \'0987654321\',
         }
         ,
         {
           contactType : \'email\',
           contactData : \'msmith@acme.com\',
         }
         ]
       }
     ,
       {
       firstName : \'George\',
       lastName : \'Washington\',
       contacts : [
         {
           contactType : \'phone\',
           contactData : \'1231231234\',
         }
         ,
         {
           contactType : \'email\',
           contactData : \'gwashington@acme.com\',
         }
       ]
     }
    ]'
    );
    
    Copy
  3. Then, we parse and flatten the JSON, generate unique identifiers for each person and contact entry, and insert the data while preserving relationships between people and contact entries:

    INSERT ALL
      WHEN 1=1 THEN
        INTO contact VALUES (c_next, p_next, contact_value:contactType, contact_value:contactData)
      WHEN contact_index = 0 THEN
        INTO people VALUES (p_next, person_value:firstName, person_value:lastName)
    
    SELECT * FROM
    (
      SELECT f1.value person_value, f2.value contact_value, f2.index contact_index, p_seq.NEXTVAL p_next, c_seq.NEXTVAL c_next
      FROM input, LATERAL FLATTEN(input.json) f1, TABLE(GETNEXTVAL(people_seq)) p_seq,
        LATERAL FLATTEN(f1.value:contacts) f2, table(GETNEXTVAL(contact_seq)) c_seq
    );
    
    Copy
  4. This produces the following data (unique IDs may change):

    SELECT * FROM people;
    
    +----+-----------+------------+
    | ID | FIRSTNAME | LASTNAME   |
    |----+-----------+------------|
    |  1 | John      | Doe        |
    |  2 | Mister    | Smith      |
    |  3 | George    | Washington |
    +----+-----------+------------+
    
    SELECT * FROM contact;
    
    +----+------+--------+----------------------+
    | ID | P_ID | C_TYPE | DATA                 |
    |----+------+--------+----------------------|
    |  1 |    1 | phone  | 1234567890           |
    |  2 |    1 | email  | jdoe@acme.com        |
    |  3 |    2 | phone  | 0987654321           |
    |  4 |    2 | email  | msmith@acme.com      |
    |  5 |    3 | phone  | 1231231234           |
    |  6 |    3 | email  | gwashington@acme.com |
    +----+------+--------+----------------------+
    
    Copy

As you can see, rows are linked, and can be joined, between people.id and contact.p_id.

If additional data is added, new rows continue to receive unique IDs. For example:

 TRUNCATE TABLE input;

 INSERT INTO input SELECT PARSE_JSON(
 '[
  {
    firstName : \'Genghis\',
    lastName : \'Khan\',
    contacts : [
      {
        contactType : \'phone\',
        contactData : \'1111111111\',
      }
      ,
      {
        contactType : \'email\',
        contactData : \'gkahn@acme.com\',
      }
   ]
 }
,
 {
    firstName : \'Julius\',
    lastName : \'Caesar\',
    contacts : [
      {
        contactType : \'phone\',
        contactData : \'2222222222\',
      }
      ,
      {
        contactType : \'email\',
        contactData : \'gcaesar@acme.com\',
      }
    ]
  }
 ]'
 );

 INSERT ALL
   WHEN 1=1 THEN
     INTO contact VALUES (c_next, p_next, contact_value:contactType, contact_value:contactData)
   WHEN contact_index = 0 THEN
     INTO people VALUES (p_next, person_value:firstName, person_value:lastName)
 SELECT * FROM
 (
   SELECT f1.value person_value, f2.value contact_value, f2.index contact_index, p_seq.NEXTVAL p_next, c_seq.NEXTVAL c_next
   FROM input, LATERAL FLATTEN(input.json) f1, table(GETNEXTVAL(people_seq)) p_seq,
     LATERAL FLATTEN(f1.value:contacts) f2, table(GETNEXTVAL(contact_seq)) c_seq
 );

 SELECT * FROM people;

 +----+-----------+------------+
 | ID | FIRSTNAME | LASTNAME   |
 |----+-----------+------------|
 |  4 | Genghis   | Khan       |
 |  5 | Julius    | Caesar     |
 |  1 | John      | Doe        |
 |  2 | Mister    | Smith      |
 |  3 | George    | Washington |
 +----+-----------+------------+

 SELECT * FROM contact;

 +----+------+--------+----------------------+
 | ID | P_ID | C_TYPE | DATA                 |
 |----+------+--------+----------------------|
 |  1 |    1 | phone  | 1234567890           |
 |  2 |    1 | email  | jdoe@acme.com        |
 |  3 |    2 | phone  | 0987654321           |
 |  4 |    2 | email  | msmith@acme.com      |
 |  5 |    3 | phone  | 1231231234           |
 |  6 |    3 | email  | gwashington@acme.com |
 |  7 |    4 | phone  | 1111111111           |
 |  8 |    4 | email  | gkahn@acme.com       |
 |  9 |    5 | phone  | 2222222222           |
 | 10 |    5 | email  | gcaesar@acme.com     |
 +----+------+--------+----------------------+
Copy

Altering a Sequence

Understanding the Effects of Reversing the Direction of a Sequence

The following example shows what happens when you reverse the direction of a sequence.

This also shows that due to pre-calculation of sequence values, an ALTER SEQUENCE command might seem to take effect only after the second use of the sequence after executing the ALTER SEQUENCE command.

Create the sequence and use it as the default value for a column in a table:

CREATE OR REPLACE SEQUENCE test_sequence_wraparound_low
   START = 1
   INCREMENT = 1
   ;

CREATE or replace TABLE test_seq_wrap_low (
    i int,
    j int default test_sequence_wraparound_low.nextval
    );
Copy

Load the table:

INSERT INTO test_seq_wrap_low (i) VALUES
     (1),
     (2),
     (3);
Copy

Show the sequence values in column j:

SELECT * FROM test_seq_wrap_low ORDER BY i;
+---+---+
| I | J |
|---+---|
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
+---+---+
Copy

Alter the increment (step size) of the sequence:

ALTER SEQUENCE test_sequence_wraparound_low SET INCREMENT = -4;
Copy

Insert two more rows:

INSERT INTO test_seq_wrap_low (i) VALUES
    (4),
    (5);
Copy

Show the sequence values. Note that the first row inserted after the ALTER SEQUENCE has the value 4, not -1. The second row inserted after the ALTER SEQUENCE does take into account the new step size.

SELECT * FROM test_seq_wrap_low ORDER BY i;
+---+---+
| I | J |
|---+---|
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 0 |
+---+---+
Copy