Create hybrid tables¶

This topic provides an overview on creating hybrid tables in Snowflake.

Note

To create a hybrid table, you must have a running warehouse that is specified as the current warehouse for your session. Errors may occur if no running warehouse is specified when you create a hybrid table. For more information, see Working with Warehouses.

CREATE HYBRID TABLE options¶

You can create a hybrid table by using one of the following methods.

CREATE HYBRID TABLE. The following example creates a hybrid table with a required PRIMARY KEY constraint, inserts some rows, deletes a row, and queries the table:

CREATE OR REPLACE HYBRID TABLE application_log (
  id NUMBER PRIMARY KEY AUTOINCREMENT,
  col1 VARCHAR(20),
  col2 VARCHAR(20) NOT NULL
  );

INSERT INTO application_log (col1, col2) VALUES ('A1', 'B1');
INSERT INTO application_log (col1, col2) VALUES ('A2', 'B2');
INSERT INTO application_log (col1, col2) VALUES ('A3', 'B3');
INSERT INTO application_log (col1, col2) VALUES ('A4', 'B4');

SELECT * FROM application_log;

UPDATE application_log SET col2 = 'B3-updated' WHERE id = 3;

DELETE FROM application_log WHERE id = 4;

SELECT * FROM application_log;

Copy

CREATE HYBRID TABLE … AS SELECT (CTAS) or CREATE HYBRID TABLE … LIKE. For example:

CREATE OR REPLACE HYBRID TABLE dept_employees (
  employee_id INT PRIMARY KEY,
  department_id VARCHAR(200)
  )
AS SELECT employee_id, department_id FROM company_employees;

Copy

Loading data¶

Note

Because the primary storage for hybrid tables is a row store, hybrid tables typically have a larger storage footprint than standard tables. The main reason for the difference is that columnar data for standard tables often achieves higher rates of compression. For details about storage costs, see Evaluate cost for hybrid tables.

Optimized bulk loads¶

You can bulk load data into hybrid tables by copying either from a data stage or from other tables (using CTAS, COPY INTO <table>, or INSERT INTO … SELECT).

The optimization of bulk loads depends on whether the table is freshly created, without ever having any records loaded, or is created using a CTAS query.

When a hybrid table is empty, all three load methods (CTAS, COPY, and INSERT INTO … SELECT) use optimized bulk loading to speed up the load process. After the table is loaded, normal INSERT performance applies. You can still run incremental batch loads with COPY and INSERT INTO … SELECT operations, but they will typically be less efficient. Bulk load speeds of approximately 1 million records per minute are common but can widely vary based on the structure of the table (for example, larger records are slower to load). Optimized bulk loading will be extended to support incremental batch loads in a future release.

You can check the Statistics information in Snowsight query profiles to see whether the bulk-load fast path was used. Number of rows inserted is referred to as the Number of rows bulk loaded when the fast path is used. For example, this CTAS operation bulk loaded 200000 rows into a new table:

A subsequent incremental batch load into the same table would not use optimized bulk loading.

For more information about query profiles, see Analyze query profiles for hybrid tables and Monitor query activity with Query History.

Attention

CTAS commands do not support FOREIGN KEY constraints. If your hybrid table requires FOREIGN KEY constraints, use COPY or INSERT INTO … SELECT to load the table.

Note

Other methods of loading data into Snowflake tables (for example, Snowpipe) are not currently supported.

Adding indexes to a hybrid table¶

You can define keys, indexes, and constraints when you create a hybrid table. For example, add a unique constraint and secondary index to the table definition:

CREATE OR REPLACE HYBRID TABLE target_hybrid_table (
    col1 VARCHAR(32) PRIMARY KEY,
    col2 NUMBER(38,0) UNIQUE,
    col3 NUMBER(38,0),
    INDEX index_col3 (col3)
    )
  AS SELECT col1, col2, col3 FROM source_table;

Copy

Optionally, you can create an index for an existing hybrid table using CREATE INDEX. Use this command to add an index to a hybrid table that is actively being used for a workload and is serving queries, or has foreign keys. CREATE INDEX will build an index concurrently without locking the table during the operation.

Note

Check index build status with the SHOW INDEXES command. Only one index build at a time is supported.

However, if your hybrid table application is in development or test mode, and some downtime for the table is not an issue, it is more efficient to recreate the hybrid table and create the indexes by running an optimized bulk load, which is more efficient than online index building with the CREATE INDEX command.

Optimized bulk loading is supported for CTAS, COPY, and INSERT INTO … SELECT, but you cannot use CTAS if your table has a FOREIGN KEY constraint. The second table created in this example, fk_hybrid_table, would have to be bulk-loaded with COPY or INSERT INTO … SELECT:

CREATE OR REPLACE HYBRID TABLE ref_hybrid_table (
    col1 VARCHAR(32) PRIMARY KEY,
    col2 NUMBER(38,0) UNIQUE
);

CREATE OR REPLACE HYBRID TABLE fk_hybrid_table (
    col1 VARCHAR(32) PRIMARY KEY,
    col2 NUMBER(38,0),
    col3 NUMBER(38,0),
    FOREIGN KEY (col2) REFERENCES ref_hybrid_table(col2),
    INDEX index_col3 (col3)
);

Copy

All hybrid tables require a unique primary key. The data in a hybrid table is ordered by this primary key. You can create additional secondary indexes on non-primary key attributes to accelerate lookups along those attributes. Indexes may be able to reduce the number of records that are scanned when a query predicate uses one of the following conditions:

=, >, >=, <, <= (comparison operators)
[ NOT ] IN conditions
[ NOT ] BETWEEN conditions

Attention

To add a secondary index, you must use a role that is granted the SELECT privilege on the hybrid table. If you have access to a view of the data in the hybrid table, but not the table itself, you will not be able to add a secondary index.

If you have common, repeated queries with predicates on a specific attribute or a composite group of attributes, consider adding an index to that attribute or group of attributes to improve performance. Be aware of the following considerations when using indexes:

Increase in storage consumption when storing additional copies of the subset of data in the index.
Addition of overhead to DMLs because indexes are maintained synchronously.

Index-building errors during loads¶

Index sizes are limited in width. When building indexes on columns in a hybrid table, especially indexes on a large number of columns, any command that loads the table (including CTAS, COPY, or INSERT INTO … SELECT) might return the following error. In this case, the table contains an index named IDX_HT100_COLS:

The value is too long for index "IDX_HT100_COLS".

This error occurs because row-based storage imposes a limit on the size of the data (and metadata) that can be stored per record. To reduce the record size, try creating the table without specifying larger columns (such as wide VARCHAR columns) as indexed columns. You can also try creating indexes on fewer columns.

You can also try using INCLUDE columns on secondary indexes when you create the table. For a simple example, see Create a secondary index with an INCLUDE column. (INCLUDE columns are supported when you create hybrid tables, but not when you create indexes with the CREATE INDEX command.)