Create hybrid tables

This topic provides an overview on creating hybrid tables in Snowflake.

Note

To create a hybrid table, you must have a running warehouse that is specified as the current warehouse for your session. Errors may occur if no running warehouse is specified when you create a hybrid table. For more information, see Working with Warehouses.

CREATE HYBRID TABLE options

You can create a hybrid table by using one of the following SQL commands:

  • CREATE HYBRID TABLE. This example creates a hybrid table, inserts some rows into it, deletes a row, and queries the table:

    CREATE OR REPLACE HYBRID TABLE icecream (
      id NUMBER PRIMARY KEY AUTOINCREMENT START 1 INCREMENT 1,
      col1 VARCHAR NOT NULL,
      col2 VARCHAR NOT NULL
      );
    
    INSERT INTO icecream VALUES(1, 'A1', 'B1');
    INSERT INTO icecream VALUES(2, 'A2', 'B2');
    INSERT INTO icecream VALUES(3, 'A3', 'B3');
    INSERT INTO icecream VALUES(4, 'A4', 'B4');
    
    UPDATE icecream SET col2 = 'B3-updated' WHERE id = 3;
    
    DELETE FROM icecream WHERE id = 4;
    
    SELECT * FROM icecream;
    
    Copy
  • CREATE HYBRID TABLE … AS SELECT (CTAS) or CREATE HYBRID TABLE … LIKE. For example:

    CREATE OR REPLACE HYBRID TABLE table1 (
      employee_id INT PRIMARY KEY,
      department_id VARCHAR(200)
      )
    AS SELECT * FROM table2;
    
    Copy

Loading data

Note

Because the primary storage for hybrid tables is a row store, hybrid tables typically have a larger storage footprint than standard tables. The main reason for the difference is that columnar data for standard tables often achieves higher rates of compression. For details about storage costs, see Evaluate cost for hybrid tables.

Optimized bulk loads

You can bulk load data into hybrid tables by copying either from a data stage or from other tables (using CTAS, COPY INTO <table>, or INSERT INTO … SELECT).

The optimization of bulk loads depends on whether the table is empty or not. In this context, a table is only considered empty if no rows have ever been loaded into it. An existing table whose rows were deleted isn’t considered empty. A table that has just been created and hasn’t yet been loaded in any way, or is being created and loaded in place with CTAS, is truly empty.

When a hybrid table is empty, all three load methods (CTAS, COPY, and INSERT INTO … SELECT) use optimized bulk loading, which is a fast execution model for inserting data into hybrid tables.

Currently, when hybrid tables are not empty, optimized bulk loading is not used. Incremental batch loads with COPY and INSERT INTO … SELECT operations can still be performed, but will typically be less efficient. You should see load speeds of approximately 1 million records per minute, but note that this estimate depends on the structure of the table (for example, larger records are slower to load). If your loads, updates, and merges take too long and fail, test with a smaller data set or break these operations into smaller chunks. Optimized bulk loading will be extended to support incremental batch loads in a future release.

You can check the Statistics information in the query profile in Snowsight to see whether the bulk-load fast path was used. Number of rows inserted is referred to as the Number of rows bulk loaded when the fast path is used. For example, this CTAS operation bulk loaded 200000 rows into a new table:

CTAS query profile that uses the optimized bulk loading

A subsequent incremental batch load into the same table would not use optimized bulk loading.

For more information about query profiles, see Analyze query profiles for hybrid tables and Monitor query activity with Query History.

Attention

CTAS commands do not support FOREIGN KEY constraints. If your hybrid table requires FOREIGN KEY constraints, use COPY or INSERT INTO … SELECT to load the table.

Note

Other methods of loading data into Snowflake tables (for example, Snowpipe) are not currently supported.

Adding indexes to a hybrid table

You can define keys, indexes, and constraints when you create a hybrid table. For example:

CREATE OR REPLACE HYBRID TABLE target_hybrid_table (
    col1 VARCHAR(32) PRIMARY KEY,
    col2 NUMBER(38,0) UNIQUE,
    col3 NUMBER(38,0),
    INDEX index_col3 (col3)
    )
  AS SELECT col1, col2, col3 FROM source_table);
Copy

Optionally, you can create an index for an existing hybrid table using CREATE INDEX. Use this command to add an index to a hybrid table that is actively being used for a workload and is serving queries, or has foreign keys. CREATE INDEX will build an index concurrently without locking the table during the operation.

However, if your hybrid table application is in development or test mode, and some downtime for the table is not an issue, it is more efficient to recreate the hybrid table and create the indexes by running an optimized bulk load, which is more efficient than online index building with the CREATE INDEX command.

Optimized bulk loading is supported for CTAS, COPY, and INSERT INTO … SELECT, but you cannot use CTAS if your table has a FOREIGN KEY constraint. The second table created in this example, fk_hybrid_table, would have to be bulk-loaded with COPY or INSERT INTO … SELECT:

CREATE OR REPLACE HYBRID TABLE ref_hybrid_table (
    col1 VARCHAR(32) PRIMARY KEY,
    col2 INT UNIQUE);

CREATE OR REPLACE HYBRID TABLE fk_hybrid_table (
    col1 VARCHAR(32) PRIMARY KEY,
    col2 NUMBER(38,0),
    col3 NUMBER(38,0),
    FOREIGN KEY (col2) REFERENCES ref_hybrid_table(col2),
    INDEX index_col3 (col3)
);
Copy

All hybrid tables require a unique primary key. The data in a hybrid table is ordered by this primary key. You can add additional secondary indexes to non-primary key attributes to accelerate lookups along those attributes. To reduce the number of records to be scanned, indexes can be leveraged for =, >, >=, <, <=, and IN predicates.

Attention

To add a secondary index, you must use a role that is granted the SELECT privilege on the hybrid table. If you have access to a view of the data in the hybrid table, but not the table itself, you will not be able to add a secondary index.

If you have common, repeated queries with predicates on a specific attribute or a composite group of attributes, consider adding an index to that attribute or group of attributes to improve performance. But be aware of the following considerations when using indexes:

  • Increase in storage consumption when storing additional copies of the subset of data in the index.

  • Addition of overhead to DMLs because indexes are maintained synchronously.

Index-building errors during loads

When indexes are being built on the columns in a hybrid table, especially indexes on a large number of columns, any command that loads the table (including CTAS, COPY, or INSERT INTO … SELECT) might return the following error:

The value is too long for index "IDX_HT100_COLS".

In this example, IDX_HT100_COLS is the name of an index on the table being created.

This error occurs because row-based storage imposes a limit on the size of the data (and metadata) that can be stored per record. To reduce the record size, try creating the table without specifying larger columns (such as wide VARCHAR columns) as indexed columns.

You can also try creating the table by using INCLUDE columns on secondary indexes instead of directly indexing the columns. For a simple example, see Create a secondary index with an INCLUDE column.