Using the Search Optimization Service

The search optimization service can significantly improve the performance of point lookup queries.

In this Topic:

Understanding the Search Optimization Service

The search optimization service aims to significantly improve the performance of selective point lookup queries on large tables. A point lookup query returns only one or a small number of distinct rows. Use case examples include:

  • Business users who need fast response times for critical dashboards with highly selective filters.

  • Data scientists who are exploring large data volumes and looking for specific subsets of data.

A user can register one or more tables to the search optimization service. Search optimization is a table-level property and applies to all columns with supported data types (see the list of supported data types further below).

How Does the Search Optimization Service Work?

To improve performance for point lookups, the search optimization service relies on a persistent data structure that serves as an optimized search access path.

A maintenance service that runs in the background is responsible for creating and maintaining the search access path:

  • When you add search optimization to a table, the maintenance service creates and populates the search access path with the data needed to perform the lookups.

    The process of populating data can take time, depending on the size of the table. The service does this work in the background and does not block any concurrent operations on the table.

  • When data in the table is updated (for example, by loading new data sets or through DML operations), the maintenance service automatically updates the search access path to reflect the changes to the data.

    If queries are run when the search access path hasn’t been updated yet, the queries might run slower but will always return up-to-date results.

This search access path and the maintenance service are transparent to the user. You don’t need to create a warehouse for the service that maintains the search access path.

However, you do need to be aware of the service because there is a cost for the storage and compute resources for this service. See Managing the Costs of the Search Optimization Service.

Considering Other Solutions for Optimizing Query Performance

The search optimization service is one of several ways to optimize query performance. Related techniques include:

  • Clustering a table.

  • Creating one or more materialized views (clustered or unclustered).

Each of these has different advantages:

  • Clustering a table can speed any of the following, as long as they are on the clustering key:

    • Range searches.

    • Equality searches.

    However, a table can be clustered on only a single key (which can contain one or more columns or expressions).

  • The search optimization service speeds only equality searches. However, this applies to all the columns of supported types in a table that has search optimization enabled.

  • A materialized view speeds both equality searches and range searches, as well as some sort operations, but only for the subset of rows and columns included in the materialized view. Materialized views can be also used in order to define different clustering keys on the same source table (or a subset of that table), or in conjunction with flattening JSON / variant data.

The following table shows which of these three optimizations have storage or compute costs:

Storage Cost

Compute Cost

Search Optimization Service

Materialized View

Clustering the Table

What Access Control Privileges Are Needed For the Search Optimization Service?

To add or remove search optimization for a table, you must have the following privileges:

  • You must have OWNERSHIP privilege on the table.

  • You must have ADD SEARCH OPTIMIZATION privilege on the schema that contains the table.

    GRANT ADD SEARCH OPTIMIZATION ON SCHEMA <schema_name> TO ROLE <role>;
    

To use the search optimization service for a query, you just need SELECT privileges on the table.

You do not need any additional privileges. Because search optimization is a table property, it is automatically detected and used (if appropriate) when querying a table.

Identifying the Tables That Benefit From Search Optimization

The search optimization service is designed to improve the performance of certain types of queries. The following sections explain how to identify the tables and types of queries that can benefit from search optimization.

Queries That Benefit From Search Optimization

Search optimization works best to improve the performance of a query when the following conditions are true:

  • For the table being queried:

    • The table size is at least 100 GB.

      For smaller tables (e.g., less than 10 GB in size), the search optimization service does not improve query performance enough to justify the costs.

    • Either of the following is true:

      • The table is not clustered.

      • The table is frequently queried on columns other than the primary cluster key.

  • For the query:

    • The query typically runs for at least tens of seconds.

    • At least one of the columns accessed through the query filter operation has at least 100k-200k distinct values.

      To determine the number of distinct values, you can use either of the following:

      • Use APPROX_COUNT_DISTINCT to get the approximate number of distinct values:

        select approx_count_distinct(column1) from table1;
        
      • Use COUNT(DISTINCT <col_name>) to get the actual number of distinct values:

        select count(distinct c1), count (distinct c2)  from test_table;
        

      Because you need only an approximation of the number of distinct values, consider using APPROX_COUNT_DISTINCT, which is generally faster and cheaper than COUNT(DISTINCT <col_name>).

    • The query uses the following kinds of predicates:

      • Equality predicates (for example, <column_name> = <constant>).

      • Predicates that use IN (see example).

    • For queries that use conjunctions of predicates (i.e., AND), query performance can be improved by search optimization if any of the predicates adhere to the conditions above.

      For example, suppose that a query has:

      where condition_x and condition_y

      Search optimization can improve performance if either condition separately returns a few rows (i.e., condition_x returns few a rows or condition_y returns a few rows).

      If condition_x returns a few rows but condition_y returns many rows, the query performance can benefit from search optimization.

      See these additional examples.

    • For queries that use disjunctions of predicates (i.e., OR), query performance can be improved by search optimization if all predicates adhere to the conditions above.

      For example, suppose that a query has:

      where condition_x or condition_y

      Search optimization can improve performance if each condition separately returns a few rows (i.e., condition_x returns a few rows and condition_y returns a few rows).

      If condition_x returns a few rows but condition_y returns many rows, the query performance does not benefit from search optimization.

      In the case of disjunctions, each predicate in isolation is not decisive in the query. The other predicates need to be evaluated before it can be determined if search optimization can improve performance.

    • For queries that use conjunctions and disjunctions (i.e., AND and OR), the search optimization service can improve performance only if AND is at the top level.

      For example, the search optimization service can improve the performance of a query with:

      WHERE a = 1 AND (b = 2 OR c = 3)

      but not with:

      WHERE a = 1 OR (b = 2 AND c = 3)

How Search Optimization Affects Joins

The search optimization service does not directly improve the performance of joins. However, it can improve the performance of filtering rows from either table prior to the join, if the table has search optimization enabled and the predicate is selective.

Both tables do not need to have search optimization enabled. The decision to use search optimization is made for each table independently.

How Search Optimization Affects Views

As is the case with joins, the search optimization service can indirectly improve the performance of views. If a base table for a view has search optimization enabled and if the query uses a selective predicate for that table, the search optimization service can improve performance when filtering rows.

All tables in the view do not need to have search optimization enabled. Search optimization is performed on each table independently.

Current Limitations of the Search Optimization Service

The search optimization service does not support the following:

  • External tables.

  • Materialized views.

  • Column concatenation.

  • Analytical expressions.

  • Casts on table columns.

    Although search optimization supports predicates with implicit and explicit casts on constant values, it does not support predicates that cast values in the actual table column.

    For example, the following predicates are supported because they use implicit and explicit casts on constant values (not values in the table column):

    -- Supported predicate
    -- (where the numeric constant 3 is implicitly cast to a varchar)
    WHERE varchar_column = 3
    
    -- Supported predicate
    -- (where the numeric constant 3 is explicitly cast to a varchar)
    WHERE varchar_column = to_varchar(3)
    

    The following predicate is not supported because it uses a cast on values in the table column:

    -- Unsupported predicate
    -- (where values in a numeric column are cast to a string)
    WHERE cast(numeric_column as varchar) = '2'
    

    The search optimization service considers the original column values, not the values after the cast. As a result, the search optimization service is not used for queries with these predicates.

Search optimization does not improve performance of queries that use Time Travel because search optimization works only on active data.

Data Types Supported By the Search Optimization Service

The search optimization service currently supports equality predicate and IN list predicate searches for the following data types (i.e. fixed-size data types):

  • Fixed-point numbers (e.g. INTEGER, NUMERIC).

  • DATE, TIME, and TIMESTAMP.

  • VARCHAR.

  • BINARY.

Currently, the search optimization service does not support floating point data types, semi-structured data types, or other data types not listed above. Snowflake might add support for more data types in the future.

Adding Search Optimization to a Table

To add search optimization to a table, follow these steps:

  1. Switch to a role that has the privileges to add search optimization to the table.

  2. Run the following command:

    ALTER TABLE [IF EXISTS] <table_name> ADD SEARCH OPTIMIZATION;
    

    where table_name is the name of the table for which to add search optimization.

    For example:

    alter table test_table add search optimization;
    

    For more information, see the section on search optimization in ALTER TABLE.

  3. Run the SHOW TABLES command to verify that search optimization has been added and to determine how much of the table has been optimized.

    For example:

    SHOW TABLES LIKE '%test_table%';
    

    In the output from this command:

    • Verify that SEARCH_OPTIMIZATION is ON, which indicates that search optimization has been added.

    • Check the value of SEARCH_OPTIMIZATION_PROGRESS. This specifies the percentage of the table that has been optimized so far.

      When search optimization is first added to a table, the performance benefits do not appear immediately. The search optimization service starts populating data in the background. The benefits appear increasingly as the maintenance catches up to the current state of the table.

      Before you run a query to verify that search optimization is working, wait until this shows that the table has been fully optimized.

  4. Run a query to verify that search optimization is working.

    Note that the Snowflake optimizer automatically chooses when to use the search optimization service for a particular query. Users cannot control which queries search optimization is used for.

    Choose a query that the search optimization service is designed to optimize. See Identifying the Tables That Benefit From Search Optimization.

  5. In the web UI, view the query plan for this query, and verify that the query node “Search Optimization Access” is part of the query plan.

Managing the Costs of the Search Optimization Service

The search optimization service impacts costs for both storage and compute resources:

  • Storage resources: The search optimization service creates a search access path data structure that requires space for each table on which search optimization is enabled. The storage cost of the search access path depends upon multiple factors, including:

    • The number of distinct values (NDVs) in the table. In the extreme case where all columns have data types that use the search access path, and all data values in each column are unique, the required storage can be as much as the original table’s size.

      Typically, however, the size is approximately 1/4 of the original table’s size.

  • Compute resources:

    • Adding search optimization to a table consumes resources.

    • Maintaining the search optimization service also requires resources. Resource consumption is higher when there is high churn (i.e. when large volumes of data in the table change). These costs are roughly proportional to the amount of data ingested (added or changed). Deletes also have some cost.

      Snowflake ensures efficient credit usage by billing your account only for the actual resources used. Billing is calculated in 1-second increments.

      See the “Serverless Feature Credits Table” in the Snowflake Service Consumption Table for the costs per compute hour.

      Once you enable the search optimization service, you can view the costs for your use of the service.

Tip

Snowflake recommends starting slowly with this feature (i.e. adding search optimization to only a few tables at first) and closely monitoring the costs and benefits.

Estimating the Costs

To estimate the cost of adding search optimization to a table, use the SYSTEM$ESTIMATE_SEARCH_OPTIMIZATION_COSTS function.

In general, the costs are proportional to:

  • The number of tables on which the feature is enabled, and the number of distinct values in those tables.

  • The amount of data that changes in these tables.

Viewing the Costs

You can view the billing costs for the search optimization service by using either the web interface or SQL. See Understanding Billing for Serverless Features.

Reducing the Costs

You can control the cost of the search optimization service by carefully choosing the tables for which to enable search optimization.

In addition, to reduce the cost of the search optimization service:

  • Snowflake recommends batching DML operations on the table:

    • DELETE: If tables store data for the most recent time period (e.g. the most recent day or week or month), then when you trim your table by deleting old data, the search optimization service must take into account the updates. In some cases, you might be able to reduce costs by deleting less frequently (e.g. daily rather than hourly).

    • INSERT, UPDATE, and MERGE: Batching these types of DML statements on the table can reduce the cost of maintenance by the search optimization service.

  • If you recluster the entire table, consider dropping the search optimization property for that table before reclustering, and then re-adding the search optimization service after reclustering.

Removing the Search Optimization Property From a Table

To remove the search optimization property from a table:

  1. Switch to a role that has the privileges to remove search optimization from the table.

  2. Run the following command:

    ALTER TABLE [IF EXISTS] <table_name> DROP SEARCH OPTIMIZATION;
    

    For example:

    alter table test_table drop search optimization;
    

    For more information, see the section on search optimization in ALTER TABLE.

Examples

The following code shows creation and use of a search optimization service.

Start by creating a table with data:

create or replace table test_table (id int, c1 int, c2 string, c3 date) as
select * from values
  (1, 3, '4',  '1985-05-11'),
  (2, 4, '3',  '1996-12-20'),
  (3, 2, '1',  '1974-02-03'),
  (4, 1, '2',  '2004-03-09'),
  (5, null, null,  null);

Add the search optimization property to the table:

alter table test_table add search optimization;

The following queries can use the search optimization service:

select * from test_table where id = 2;
select * from test_table where c2 = '1';
select * from test_table where c3 = '1985-05-11';
select * from test_table where c1 is null;
select * from test_table where c1 = 4 and c3 = '1996-12-20';

The following query can use the search optimization because the implicit cast is on the constant, not the column:

select * from test_table where c2 = 1;

The following cannot use the search optimization because the cast is on the table’s column:

select * from test_table where cast(c1 as string) = '2';

An IN clause is compatible with search optimization:

select id, c1, c2, c3
    from test_table
    where id IN (2, 3)
    order by id;

If predicates are individually compatible with search optimization, then they can be joined by the conjunction AND and still be compatible with search optimization:

select id, c1, c2, c3
    from test_table
    where c1 = 1
       and
          c3 = TO_DATE('2004-03-09')
    order by id;

DELETE and UPDATE (and MERGE) can also use the search optimization service:

delete from test_table where id = 3;
update test_table set c1 = 99 where id = 4;