Best practices for hybrid tables

This topic describes best practices and important considerations when using hybrid tables. To achieve optimal performance with hybrid tables, follow these best practices in your deployment. This guide outlines specific configuration, design, and operational practices that maximize performance for production workloads.

Query performance in Snowsight versus driver-based access

Attention

Performance statistics reported in Snowsight are not indicative of query performance for driver-based workloads.

Snowsight provides rich access to query plans, data statistics, query history, and other detailed information that is useful for interactive query prototyping, debugging, investigation, monitoring, and other activities. Providing that rich interactive experience adds overhead to the Snowflake query engine. As such, latency for short-running queries executed through Snowsight is not indicative of performance that can be achieved with programmatic drivers. Queries executed via code-based or driver-based solutions execute with lower latency and variability than queries executed via Snowsight.

Note

Run a simple performance test to validate performance for your scenario.

Client drivers for hybrid tables

In order to access hybrid tables, you will need to use one of the following driver versions:

Driver

Minimum Version

Go

1.6.25

JDBC

3.13.31

.Net

2.1.2

Node.js

1.9.0

ODBC

3.0.2

PHP

2.0.0

Python Connector

3.1.0

SnowSQL

1.2.28

Note

You may not be able to access hybrid tables using an earlier driver version.

For optimal performance with hybrid tables, be sure to use the latest version of your selected driver.

You can also access hybrid tables by using the Snowflake SQL API; however, this API is not recommended for use cases that require optimal latency.

Client configuration and access methods

Connection management directly affects performance and scalability. When connecting to databases that contain hybrid tables, consider the following best practices for achieving good performance.

  • Use connection pooling with long-lived connections to eliminate the overhead of repeatedly establishing new connections. Most client frameworks that connect to Snowflake provide a connection-pooling mechanism to efficiently manage access.

  • Network proximity significantly affects end-to-end latency; therefore, colocate your client software in the same cloud region as the Snowflake account.

  • Use prepared statements with bound parameters so the query planner will reuse previously created query plans.

  • Use the supported programmatic client drivers, not Snowsight, to achieve optimal latency. See Client drivers for hybrid tables.

Index design and usage

Creating and using indexes is a key component to achieving optimal performance for hybrid tables. Consider the following recommendations:

  • Create secondary indexes for frequently used predicates.

  • Design composite indexes to match complete query patterns.

  • Avoid using multiple indexes with columns in the same ordinal position.

  • Understand the cardinality of your data before creating indexes. Indexes built with a single, low-cardinality column have limited benefit. See Estimating the Number of Distinct Values.

  • Indexes add write overhead and storage requirements. Be careful to balance read versus write performance for applications that require low-latency write operations.

Properly designed indexes significantly improve query performance by providing efficient data access paths. If possible, choose primary keys for optimal selectivity while minimizing complexity. In some cases, adding columns with calculated or surrogate key values provides better performance than complex composite indexes. Secondary indexes dramatically improve performance for frequently accessed columns.

For well-defined queries, using the INCLUDE keyword to add columns to an index when you create the table might further decrease latency. See Create a secondary index with an INCLUDE column.

Attention

Be mindful of the indexes you create on a hybrid table; non-selective index scans result in sub-optimal performance, throttling, and higher cost.

Queries that qualify for index use

Hybrid table indexes may be accessed when queries use one of the following conditions:

  • <column_reference> {=, >, >=, <, <=} <constant_value>

  • <column_reference> IN <constant_in_list>

  • <column_reference> BETWEEN <constant_value> AND <constant_value>

Expressions can be chained together using Logical operators.

For example:

CREATE OR REPLACE HYBRID TABLE icecream_orders (
  id NUMBER PRIMARY KEY AUTOINCREMENT START 1 INCREMENT 1 ORDER,
  store_id NUMBER NOT NULL,
  flavor VARCHAR(20) NOT NULL,
  order_ts TIMESTAMP_NTZ,
  num_scoops NUMERIC,
  INDEX idx_icecream_order_store (store_id, order_ts),
  INDEX idx_icecream_timestamp (order_ts)
  );

-- Generate sample data for testing

INSERT INTO icecream_orders (store_id, flavor, order_ts, num_scoops)
  SELECT
    UNIFORM(1, 10, RANDOM()),
    ARRAY_CONSTRUCT('CHOCOLATE', 'VANILLA', 'STRAWBERRY', 'LEMON')[UNIFORM(0, 3, RANDOM())],
    DATEADD(SECOND, UNIFORM(0, 86400, RANDOM()), DATEADD(DAY, UNIFORM(-90, 0, RANDOM()), CURRENT_DATE())),
    UNIFORM(1, 3, RANDOM())
  FROM TABLE(GENERATOR(ROWCOUNT => 10000))
  ;

-- Use idx_icecream_order_store (first column)

  SELECT *
    FROM icecream_orders
    WHERE store_id = 5;

-- Use idx_icecream_order_store (both columns)

  SELECT *
    FROM icecream_orders
    WHERE store_id IN (1,2,3) AND order_ts > DATEADD(DAY, -7, CURRENT_DATE());

-- Use idx_icecream_timestamp

  SELECT *
    FROM icecream_orders
    WHERE order_ts BETWEEN DATEADD(DAY, -2, CURRENT_DATE()) AND DATEADD(DAY, -2, CURRENT_DATE());
Copy

Bulk loading data

You can use several optimizations and best practices for loading data into hybrid tables:

Hybrid tables provide an optimized bulk loading path that delivers up to 10x faster loading performance than standard loading methods. This optimized bulk loading path is automatically applied when you load data into an empty table using CTAS (CREATE TABLE AS SELECT), COPY INTO, or INSERT INTO SELECT commands. (An empty table is a table that has never contained any data.)

You can verify that the optimization is being used by checking the statistics section of the query profile, where rows will be reported as Number of rows bulk loaded rather than Number of rows inserted.

Note

CTAS operations do not support FOREIGN KEY constraints. If your table requires foreign keys, you must use COPY or INSERT INTO SELECT instead.

For tables that already contain data, the optimized bulk loading path is not currently available. In these cases, loading operations may achieve approximately 1 million records per minute, though this varies based on record size, table structure, and number of indexes.

Warehouse optimization

A warehouse of size X-Small is sufficient for many operational workloads. In order to achieve higher concurrency and throughput on short-running operational queries, increase the compute node count by using a multi-cluster warehouse rather than increasing compute resources with a larger warehouse.

If your workload has variable throughput patterns, you can enable autoscaling to reduce consumption when demand is lower. Set the scaling policy to Standard rather than Economy for the best performance and efficiency on workloads that require high throughput or low latency. For more information, see Setting the scaling policy for a multi-cluster warehouse.

In some cases, isolating workloads in separate warehouses might be beneficial to enable independent scaling. If you have a mixed hybrid workload with operational and analytical components, it is beneficial to separate the operational and analytical components into different warehouses. If you cannot separate them and must execute them together on the same warehouse, choose the warehouse size based on the analytical query latency requirements and choose the multi-cluster node count based on what is required to support your workload’s throughput.

Caching and warm-up

The first hybrid table query issued to a newly started warehouse triggers activities such as query planning, index selection, I/O to load data, caching decisions, and, of course query execution. The query engine continues to optimize memory and storage for the query. This time is called the “warm-up” period. Query latency drops until the engine converges on a steady-state latency.

  • Use dedicated warehouses for hybrid table workloads to avoid cache interference.

  • Understand that reaching steady-state latency takes from several seconds to 2-3 minutes as the cache warms up.

  • Configure auto-suspend and auto-scaling to balance efficiency and cache warmth.

Hybrid tables utilize multiple caching approaches to optimize performance. The plan cache reduces compilation overhead by storing frequently used query plans. The column store data cache maintains frequently accessed data in memory, and the metadata cache provides rapid access to table and index information. Hybrid tables do not use a result cache.

These caches require some time to optimize for your workload patterns. Using dedicated warehouses for hybrid table workloads prevents cache interference from other workloads. Initial queries after a cold start experience higher latency until caches are populated. If your workload has variable throughput patterns, you can enable autoscaling and auto-suspend to reduce consumption or suspend your warehouse when demand is lower. When your warehouse restarts or auto-scales to add a new cluster, caches will need to rehydrate. Set the scaling policy to Standard rather than Economy for the best performance. see Multi-cluster warehouses.

Stored procedures and hybrid tables

Stored procedures are supported for hybrid tables; however, executing transactions with AUTOCOMMIT enabled or multi-statement transactions offers better performance and efficiency than calling a stored procedure.

Serverless tasks and hybrid tables

While serverless tasks are supported, be aware that you may not experience optimal performance or efficiency for workloads that use hybrid tables.

Performance monitoring

The recommended view to use for hybrid table performance monitoring is the AGGREGATE_QUERY_HISTORY view. This view contains query execution details aggregated over a short period of time.

For example, to retrieve the average default interval performance over the last 24 hours for a warehouse serving hybrid table requests:

SELECT *
  FROM SNOWFLAKE.ACCOUNT_USAGE.AGGREGATE_QUERY_HISTORY
  WHERE warehouse_name = 'HYBRID_TABLES_WAREHOUSE'
  AND query_type = 'SELECT'
  AND interval_start_time >= DATEADD(hour, -24, CURRENT_TIMESTAMP());
Copy

See the AGGREGATE_QUERY_HISTORY view for more examples.

Monitoring quotas and throttling

Hybrid tables implement quota controls at the account level for both hybrid storage and hybrid table requests throughput. These quotas ensure consistent performance across all users. The default quotas are sufficient for most initial implementations, but may need adjustment as workloads grow.

  • Monitor the hybrid table requests quota by using the AGGREGATE_QUERY_HISTORY view.

  • Monitor the hybrid storage quota by using the STORAGE_USAGE view.

  • High throttling percentages in query profiles indicate you’re approaching throughput limits. When you consistently utilize more than 70% of either quota, proactively request an increase through Snowflake Support.

The performance of hybrid tables is subject to throttling even in a case where virtual warehouse compute usage is not high. To monitor your usage and determine whether a hybrid table is being throttled, see the example in the AGGREGATE_QUERY_HISTORY view. You can also retrieve the number of throttled hybrid table requests from the HYBRID_TABLE_REQUESTS_THROTTLED_COUNT column.

For more information, see Quotas and throttling.

Troubleshooting performance issues

If you’re not achieving expected performance after implementing these best practices, Snowflake Support can help analyze and optimize your implementation. When creating a support case, include the following information to enable rapid resolution:

  • Query IDs (UUIDs) for representative queries showing suboptimal performance

  • Workload characteristics:

    • Typical query patterns

    • Expected versus actual latency

    • Concurrency requirements

    • Data storage volumes

    • Query response row size

    • Column cardinality estimates

  • Any recent changes to table schemas, indexes, or workload patterns

  • Throttling metrics from query profiles

  • Performance differences between cold and warm warehouses

Include both fast and slow examples of similar queries if possible to help identify optimization opportunities. This comparison helps support teams quickly identify potential configuration or design improvements.