Create hybrid tables

This topic provides an overview on creating hybrid tables in Snowflake.

Note

To create a hybrid table, you must have a running warehouse that is specified as the current warehouse for your session. Errors may occur if no running warehouse is specified when you create a hybrid table. For more information, see Working with Warehouses.

CREATE HYBRID TABLE options

You can create a hybrid table by using one of the following SQL commands:

  • CREATE HYBRID TABLE. This example creates a hybrid table, inserts some rows into it, deletes a row, and queries the table:

    CREATE OR REPLACE HYBRID TABLE icecream (
      id NUMBER PRIMARY KEY AUTOINCREMENT START 1 INCREMENT 1,
      col1 VARCHAR NOT NULL,
      col2 VARCHAR NOT NULL
      );
    
    INSERT INTO icecream VALUES(1, 'A1', 'B1');
    INSERT INTO icecream VALUES(2, 'A2', 'B2');
    INSERT INTO icecream VALUES(3, 'A3', 'B3');
    INSERT INTO icecream VALUES(4, 'A4', 'B4');
    
    UPDATE icecream SET col2 = 'B3-updated' WHERE id = 3;
    
    DELETE FROM icecream WHERE id = 4;
    
    SELECT * FROM icecream;
    
    Copy
  • CREATE HYBRID TABLE … AS SELECT (CTAS) or CREATE HYBRID TABLE … LIKE. For example:

    CREATE OR REPLACE HYBRID TABLE table1 (
      employee_id INT PRIMARY KEY,
      department_id VARCHAR(200)
      )
    AS SELECT * FROM table2;
    
    Copy

Loading data

Note

Because the primary storage for hybrid tables is a row store, hybrid tables typically have a larger storage footprint than standard tables. The main reason for the difference is that columnar data for standard tables often achieves higher rates of compression. For details about storage costs, see Evaluate cost for hybrid tables.

You can bulk load data into hybrid tables by copying from a data stage or other tables (that is, using CTAS, COPY, or INSERT INTO … SELECT), but bulk loading data into a hybrid table using a CTAS statement is strongly recommended. Bulk loading with CTAS is optimized for performance and enables large loads (multiple millions of records) to run approximately 10 times faster than other options, in most cases.

Bulk loading via INSERT or COPY is also supported, but data loading is slower for large amounts of data. You might experience long wait times or even timeouts when loading large data sets. Queries against recently loaded data are also slower. Bulk load optimization for DML statements such as COPY is planned for the near future.

Attention

CTAS commands do not support foreign key constraints. If you must use foreign key constraints in your hybrid table, use another load option such as COPY or INSERT INTO … SELECT.

If your source data exists in an external stage instead of a Snowflake table, use the following syntax:

CREATE TABLE ... AS SELECT * FROM @stage/data.csv
Copy

Avoid using this syntax:

COPY FROM @stage/data.csv
Copy

The CTAS option is the more efficient loading method.

When you are not using CTAS, you should be able to load up to approximately 1 million records per minute. However, note that this approximation depends on the structure of the table (for example, the size of records, where larger records are slower to load). If your loading operations are taking too long and failing, test with a smaller data set or break your loads into smaller chunks.

When indexes are built on the columns in a hybrid table, especially indexes on a large number of columns, any command that loads the table (including CTAS, COPY, or INSERT INTO … SELECT) might return the following error:

The value is too long for index "IDX_HT100_COLS".

where IDX_HT100_COLS is the name of an index on the table being created.

This error occurs because row-based storage imposes a limit on the size of the data (and metadata) that can be stored per record. To reduce the record size, try creating the table without specifying larger columns (such as wide VARCHAR columns) as indexed columns.

You can also try creating the table by using INCLUDE columns on secondary indexes instead of directly indexing the columns. For a simple example, see Create a secondary index with an INCLUDE column.

Note

Other methods of loading data into Snowflake tables (for example, Snowpipe) are not currently supported.

Adding indexes to a hybrid table

You can define keys, indexes, and constraints when you create a hybrid table. For example:

CREATE OR REPLACE HYBRID TABLE targethybridtable (
    col1 VARCHAR(32) PRIMARY KEY,
    col2 NUMBER(38,0) UNIQUE,
    col3 NUMBER(38,0),
    INDEX index_col3 (col3)
    )
  AS SELECT col1, col2, col3 FROM sourcetable);
Copy

Optionally, you can create an index for an existing hybrid table using CREATE INDEX. Use this command to add an index to a hybrid table that is actively being used for a workload and is serving queries, or has foreign keys. CREATE INDEX will build an index concurrently without locking the table during the operation.

However, if your hybrid table application is in development or test mode, and some downtime for the table is not an issue, it is more efficient to recreate the hybrid table and create the indexes within the CTAS command. The bulk-loading fast path that the CTAS command provides is more efficient than online index building with the CREATE INDEX command.

A hybrid table requires a unique primary key. The data in the table is ordered by this primary key. You can add additional secondary indexes to non-primary key attributes to accelerate lookups along those attributes. To reduce the number of records to be scanned, indexes can be leveraged for =, >, >=, <, <=, and IN predicates.

Attention

To add a secondary index, you must use a role that is granted the SELECT privilege on the hybrid table. If you have access to a view of the data in the hybrid table, but not the table itself, you will not be able to add a secondary index.

If you have common, repeated queries with predicates on a specific attribute or a composite group of attributes, consider adding an index to that attribute or group of attributes to improve performance. But be aware of the following considerations when using indexes:

  • Increase in storage consumption when storing additional copies of the subset of data in the index.

  • Addition of overhead to DMLs because indexes are maintained synchronously.