Using Sequences¶
Sequences are used to generate unique numbers across sessions and statements, including concurrent statements. They can be used to generate values for a primary key or any column that requires a unique value.
Important
Snowflake does not guarantee generating sequence numbers without gaps. The generated numbers are not necessarily contiguous.
Sequence Semantics¶
Snowflake sequences currently utilize the following semantics:
All values generated by a sequence are globally unique as long as the sign of the sequence interval does not change (e.g. by changing the step size). Concurrent queries never observe the same value, and values within a single query are always distinct.
Changing the sequence interval from positive to negative (e.g. from
1
to-1
), or vice versa may result in duplicates. For example, if the first query(s) return sequence values1
,2
, and3
, and if the interval is changed from1
to-1
, then the next few values generated include2
, and1
, which were generated previously.Snowflake may calculate the next value for a sequence as soon as the current sequence number is used, rather than waiting until the next sequence number is requested.
A consequence of this is that an
ALTER SEQUENCE ... SET INCREMENT ...
command might not affect the next operation that uses the sequence. For an example, see Understanding the Effects of Reversing the Direction of a Sequence.Each generated sequence value additionally reserves values depending on the sequence interval, also called the “step”. The reserved values span from the sequence value to
<value> + (sign(<step>) * abs(<step>)) - (sign(<step>) * 1)
(inclusive).
Thus, if the value
100
is generated:With a step of
2
, values100
and101
are reserved.With a step of
10
, values100
to109
are reserved.With a step of
-5
, values96
to100
are reserved.
A reserved value is never generated by the sequence as long as the step/interval is never modified.
Values generated by a sequence are greater than the maximum value produced by a previous statement (or less than the minimum value if the step size is negative) if the following are true:
The sequence does not have the NOORDER property.
NOORDER specifies that the values are not guaranteed to be in increasing order.
For example, if a sequence has
START 1 INCREMENT 2
, the generated values might be1
,3
,101
,5
,103
, etc.NOORDER can improve performance when multiple INSERT operations are performed concurrently (for example, when multiple clients are executing multiple INSERT statements).
The previous statement completed, and an acknowledgment was received, prior to submitting the current statement.
This behavior does not hold if the sign of the interval is changed (positive to negative or negative to positive).
There is no guarantee that values from a sequence are contiguous (gap-free) or that the sequence values are assigned in a particular order. There is, in fact, no way to assign values from a sequence to rows in a specified order other than to use single-row statements (this still provides no guarantee about gaps).
A sequence value can represent a 64-bit two’s complement integer (-2^63
to 2^63 - 1
). If the internal representation of a
sequence’s next value exceeds this range (in either direction) an error results and the query fails. Note that this may result in
losing these sequence values.
In this situation, you must either use a smaller (in magnitude) increment value or create a new sequence with a smaller start value. As gaps
may occur, the internal representation of the next value may exceed the allowable range even if the returned sequence values are all within
the allowable range. Snowflake does not provide an explicit guarantee regarding how to avoid this error, but Snowflake supports sequence
objects that correctly provide unique values. A sequence object created with a start value of 1
and an increment value of
1
is extremely unlikely to exhaust the allowable range of sequence values.
Referencing Sequences¶
currval
Not Supported¶
Many databases provide a currval
sequence reference; however, Snowflake does not. currval
in other systems is typically used
to create primary-foreign key relationships between tables — a first statement inserts a single row into the fact table using a sequence
to create a key. Subsequent statements insert rows into the dimension tables using currval
to refer to the fact table’s key.
This pattern is contrary to Snowflake best practices — bulk queries should be preferred over small, single-row queries. The same task can be better accomplished using multi-table INSERT and sequence references in nested subqueries. For a detailed example, see Ingesting and Normalizing Denormalized Data (in this topic).
Sequences as Expressions¶
Sequences may be accessed in queries as expressions of the form seq_name.NEXTVAL
. Each occurrence of a sequence generates a set of
distinct values. This is different from what many other databases provide, where multiple references to NEXTVAL
of a sequence return
the same value for each row.
For example, the following query returns distinct values for columns a
and b
:
CREATE OR REPLACE SEQUENCE seq1; SELECT seq1.NEXTVAL a, seq1.NEXTVAL b FROM DUAL;
To return two columns with the same generated sequence value, use nested subqueries and views:
CREATE OR REPLACE SEQUENCE seq1; SELECT seqRef.a a, seqRef.a b FROM (SELECT seq1.NEXTVAL a FROM DUAL) seqRef;
Nested subqueries generate as many distinct sequence values as rows returned by the subquery (so a sequence reference in a query block with several joins refers not to any of the joined objects, but the output of the query block). These generated values may not be observed if the associated rows are later filtered out, or the values may be observed twice (as in the above example) if the sequence column or the inline view are referred to multiple times.
Note
For multi-table insert, insert values may be provided both in the VALUES clauses and in the SELECT input:
VALUES clauses referring to a sequence value aliased from the input SELECT receive the same value.
VALUES clauses containing a direct reference to a sequence
NEXTVAL
receive distinct values.
In contrast, Oracle restricts sequence references to VALUES clauses only.
Sequences as Table Functions¶
Nested queries with sequence references are often difficult to understand and verbose — any shared reference (where two columns of a row should receive the same sequence value) requires an additional level of query nesting. To simplify nested-query syntax, Snowflake provides an additional method to generate sequences using the table function GETNEXTVAL, as in the following example:
CREATE OR REPLACE SEQUENCE seq1; CREATE OR REPLACE TABLE foo (n NUMBER); INSERT INTO foo VALUES (100), (101), (102); SELECT n, s.nextval FROM foo, TABLE(GETNEXTVAL(seq1)) s;
GETNEXTVAL is a special 1-row table function that generates a unique value (and joins this value) to other objects in the SELECT statement.
A call to GETNEXTVAL must be aliased; otherwise, the generated values cannot be referenced. Multiple columns may refer to a generated value
by accessing this alias. The GETNEXTVAL alias contains an attribute also named NEXTVAL
.
The GETNEXTVAL table function additionally allows precise control over sequence generation when many tables are joined together. The order of objects in the FROM clause determines where values are generated. Sequence values are generated over the result of joins between all objects listed prior to GETNEXTVAL in the FROM clause. The resulting rows are then joined to the objects to the right. There is an implicit lateral dependence between GETNEXTVAL and all other objects in the FROM clause. Joins may not reorder around GETNEXTVAL. This is an exception in SQL, as typically the order of objects does not affect the query semantics.
Consider the following example with tables t1
, t2
, t3
, and t4
:
CREATE OR REPLACE SEQUENCE seq1; SELECT t1.*, t2.*, t3.*, t4.*, s.NEXTVAL FROM t1, t2, TABLE(GETNEXTVAL(seq1)) s, t3, t4;
This query will join t1
to t2
, generate a unique value of the result, and then join the resulting relation against t3
and t4
. The order of joins between the post-sequence relation, t3
, and t4
is not specified because inner joins are
associative.
Note
These semantics can be tricky. We recommend using GETNEXTVAL at the end of the FROM clause, when possible and appropriate, to avoid confusion.
Using Sequences to Create Default Column Values¶
Sequences can be used in tables to generate primary keys for table columns. The following tools provide a simple way to do this.
Column Default Expressions¶
The column default expression can be a sequence reference. Omitting the column in an insert statement or setting the value to DEFAULT in an insert or update statement will generate a new sequence value for the row.
For example:
CREATE OR REPLACE SEQUENCE seq1; CREATE OR REPLACE TABLE foo (k NUMBER DEFAULT seq1.NEXTVAL, v NUMBER); -- insert rows with unique keys (generated by seq1) and explicit values INSERT INTO foo (v) VALUES (100); INSERT INTO foo VALUES (DEFAULT, 101); -- insert rows with unique keys (generated by seq1) and reused values. -- new keys are distinct from preexisting keys. INSERT INTO foo (v) SELECT v FROM foo; -- insert row with explicit values for both columns INSERT INTO foo VALUES (1000, 1001); SELECT * FROM foo; +------+------+ | K | V | |------+------| | 1 | 100 | | 2 | 101 | | 3 | 100 | | 4 | 101 | | 1000 | 1001 | +------+------+
The advantage of using sequences as a column default value is that the sequence can be referenced in other locations, and even be the default value for multiple columns and in multiple tables. If a sequence is named as the default expression of a column and then subsequently dropped any attempt to insert/update the table using the default value will result in an error saying the identifier cannot be found.
Ingesting and Normalizing Denormalized Data¶
Consider a schema with two tables, people
and contact
:
The
people
table contains:A primary key unique identifier:
id
Two string columns:
firstName
andlastName
The
contact
table contains:A primary key unique identifier:
id
A foreign key linking this contact entry to a person:
p_id
Two string columns:
c_type
: The type of contact (e.g. ‘email’ or ‘phone’).data
: The actual contact information.
Data in this format frequently is denormalized for ingestion or while processing semi-structured data.
This example illustrates ingesting JSON data, denormalizing it to extract the desired data, and normalizing the data as it is inserted into tables. At the same time, it is important to create unique identifiers on rows while maintaining the intended relationships across rows of tables. We accomplish this with sequences.
First, we set up the tables and sequences used in the example:
-- primary data tables CREATE OR REPLACE TABLE people (id number, firstName string, lastName string); CREATE OR REPLACE TABLE contact (id number, p_id number, c_type string, data string); -- sequences to produce primary keys on our data tables CREATE OR REPLACE SEQUENCE people_seq; CREATE OR REPLACE SEQUENCE contact_seq; -- staging table for json CREATE OR REPLACE TABLE input (json variant);
Next, we insert data from table
json
:INSERT INTO input SELECT parse_json( '[ { firstName : \'John\', lastName : \'Doe\', contacts : [ { contactType : \'phone\', contactData : \'1234567890\', } , { contactType : \'email\', contactData : \'jdoe@acme.com\', } ] } , { firstName : \'Mister\', lastName : \'Smith\', contacts : [ { contactType : \'phone\', contactData : \'0987654321\', } , { contactType : \'email\', contactData : \'msmith@acme.com\', } ] } , { firstName : \'George\', lastName : \'Washington\', contacts : [ { contactType : \'phone\', contactData : \'1231231234\', } , { contactType : \'email\', contactData : \'gwashington@acme.com\', } ] } ]' );
Then, we parse and flatten the JSON, generate unique identifiers for each person and contact entry, and insert the data while preserving relationships between people and contact entries:
INSERT ALL WHEN 1=1 THEN INTO contact VALUES (c_next, p_next, contact_value:contactType, contact_value:contactData) WHEN contact_index = 0 THEN INTO people VALUES (p_next, person_value:firstName, person_value:lastName) SELECT * FROM ( SELECT f1.value person_value, f2.value contact_value, f2.index contact_index, p_seq.NEXTVAL p_next, c_seq.NEXTVAL c_next FROM input, LATERAL FLATTEN(input.json) f1, TABLE(GETNEXTVAL(people_seq)) p_seq, LATERAL FLATTEN(f1.value:contacts) f2, table(GETNEXTVAL(contact_seq)) c_seq );
This produces the following data (unique IDs may change):
SELECT * FROM people; +----+-----------+------------+ | ID | FIRSTNAME | LASTNAME | |----+-----------+------------| | 1 | John | Doe | | 2 | Mister | Smith | | 3 | George | Washington | +----+-----------+------------+ SELECT * FROM contact; +----+------+--------+----------------------+ | ID | P_ID | C_TYPE | DATA | |----+------+--------+----------------------| | 1 | 1 | phone | 1234567890 | | 2 | 1 | email | jdoe@acme.com | | 3 | 2 | phone | 0987654321 | | 4 | 2 | email | msmith@acme.com | | 5 | 3 | phone | 1231231234 | | 6 | 3 | email | gwashington@acme.com | +----+------+--------+----------------------+
As you can see, rows are linked, and can be joined, between people.id
and contact.p_id
.
If additional data is added, new rows continue to receive unique IDs. For example:
TRUNCATE TABLE input; INSERT INTO input SELECT PARSE_JSON( '[ { firstName : \'Genghis\', lastName : \'Khan\', contacts : [ { contactType : \'phone\', contactData : \'1111111111\', } , { contactType : \'email\', contactData : \'gkahn@acme.com\', } ] } , { firstName : \'Julius\', lastName : \'Caesar\', contacts : [ { contactType : \'phone\', contactData : \'2222222222\', } , { contactType : \'email\', contactData : \'gcaesar@acme.com\', } ] } ]' ); INSERT ALL WHEN 1=1 THEN INTO contact VALUES (c_next, p_next, contact_value:contactType, contact_value:contactData) WHEN contact_index = 0 THEN INTO people VALUES (p_next, person_value:firstName, person_value:lastName) SELECT * FROM ( SELECT f1.value person_value, f2.value contact_value, f2.index contact_index, p_seq.NEXTVAL p_next, c_seq.NEXTVAL c_next FROM input, LATERAL FLATTEN(input.json) f1, table(GETNEXTVAL(people_seq)) p_seq, LATERAL FLATTEN(f1.value:contacts) f2, table(GETNEXTVAL(contact_seq)) c_seq ); SELECT * FROM people; +----+-----------+------------+ | ID | FIRSTNAME | LASTNAME | |----+-----------+------------| | 4 | Genghis | Khan | | 5 | Julius | Caesar | | 1 | John | Doe | | 2 | Mister | Smith | | 3 | George | Washington | +----+-----------+------------+ SELECT * FROM contact; +----+------+--------+----------------------+ | ID | P_ID | C_TYPE | DATA | |----+------+--------+----------------------| | 1 | 1 | phone | 1234567890 | | 2 | 1 | email | jdoe@acme.com | | 3 | 2 | phone | 0987654321 | | 4 | 2 | email | msmith@acme.com | | 5 | 3 | phone | 1231231234 | | 6 | 3 | email | gwashington@acme.com | | 7 | 4 | phone | 1111111111 | | 8 | 4 | email | gkahn@acme.com | | 9 | 5 | phone | 2222222222 | | 10 | 5 | email | gcaesar@acme.com | +----+------+--------+----------------------+
Altering a Sequence¶
Understanding the Effects of Reversing the Direction of a Sequence¶
The following example shows what happens when you reverse the direction of a sequence.
This also shows that due to pre-calculation of sequence values, an ALTER SEQUENCE command might seem to take effect only after the second use of the sequence after executing the ALTER SEQUENCE command.
Create the sequence and use it as the default value for a column in a table:
CREATE OR REPLACE SEQUENCE test_sequence_wraparound_low
START = 1
INCREMENT = 1
;
CREATE or replace TABLE test_seq_wrap_low (
i int,
j int default test_sequence_wraparound_low.nextval
);
Load the table:
INSERT INTO test_seq_wrap_low (i) VALUES
(1),
(2),
(3);
Show the sequence values in column j
:
SELECT * FROM test_seq_wrap_low ORDER BY i;
+---+---+
| I | J |
|---+---|
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
+---+---+
Alter the increment (step size) of the sequence:
ALTER SEQUENCE test_sequence_wraparound_low SET INCREMENT = -4;
Insert two more rows:
INSERT INTO test_seq_wrap_low (i) VALUES
(4),
(5);
Show the sequence values. Note that the first row inserted after the ALTER SEQUENCE
has the value 4
, not -1
. The second row inserted after the ALTER SEQUENCE
does take into account the new step size.
SELECT * FROM test_seq_wrap_low ORDER BY i;
+---+---+
| I | J |
|---+---|
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 0 |
+---+---+