Using the Search Optimization Service

The search optimization service can significantly improve the performance of certain types of lookup and analytical queries that use an extensive set of predicates for filtering.

To start with a tutorial that compares execution time with and without search optimization, go to Getting Started with Search Optimization

Understanding the Search Optimization Service

The search optimization service aims to significantly improve the performance of certain types of queries on tables, including:

  • Selective point lookup queries on tables. A point lookup query returns only one or a small number of distinct rows. Use case examples include:

    • Business users who need fast response times for critical dashboards with highly selective filters.

    • Data scientists who are exploring large data volumes and looking for specific subsets of data.

    • Data applications retrieving a small set of results based on an extensive set of filtering predicates.

  • Substring and regular expression searches (e.g. LIKE, ILIKE, RLIKE, etc.).

  • Queries on fields in VARIANT, OBJECT, and ARRAY columns that use certain types of predicates:

    • Equality predicates.

    • IN predicates.

    • Predicates that use ARRAY_CONTAINS.

    • Predicates that use ARRAYS_OVERLAP.

    • Substring and regular expression predicates.

    • Predicates that check for NULL values.

  • Queries that use selected geospatial functions with GEOGRAPHY values.

Once you identify the queries that can benefit from the search optimization service, you can configure search optimization for the columns and tables used in those queries.

How Does the Search Optimization Service Work?

To improve performance for point lookups, the search optimization service creates and maintains a persistent data structure called a search access path. The search access path keeps track of which values of the table’s columns might be found in each of its micro-partitions, allowing some to be skipped when scanning the table.

A maintenance service is responsible for creating and maintaining the search access path:

  • When you configure search optimization for a table, the maintenance service creates and populates the search access path with the data needed to perform the lookups.

    The process of populating data can take significant time, depending on the size of the table. The maintenance service works in the background and does not block any concurrent operations on the table.

  • When data in the table is updated (for example, by loading new data sets or through DML operations), the maintenance service automatically updates the search access path to reflect the changes to the data.

    If queries are run when the search access path hasn’t been updated, queries might run slower, but will still return up-to-date results.

The progress of each table’s maintenance service appears in the search_optimization_progress column in the output of SHOW TABLES. Before you measure the performance improvement of search optimization on a newly-optimized table, make sure this column shows that the table has been fully optimized.

This search access path and the maintenance service are transparent to the user. You don’t need to create a warehouse for the service that maintains the search access path.

However, note that there is a cost for the storage and compute resources for this service. For more details, refer to Managing the Costs of the Search Optimization Service (in this topic).

Considering Other Solutions for Optimizing Query Performance

The search optimization service is one of several ways to optimize query performance. Related techniques include:

  • Query acceleration.

  • Clustering a table.

  • Creating one or more materialized views (clustered or unclustered).

Each of these has different advantages, as shown in the following table:

Feature

Supported Query Types

Notes

Search Optimization Service

The search optimization service can improve the performance of these types of searches for the supported data types.

Query Acceleration Service

Queries with filters or aggregation. If the query includes LIMIT, the query must also include ORDER BY.
The filters must be highly selective, and the ORDER BY clause must have a low cardinality.

Query acceleration works well with ad-hoc analytics, queries with unpredictable data volume,
and queries with large scans and selective filters.

Query acceleration and search optimization are complementary. Both can accelerate the same query. See Compatibility with Query Acceleration.

Materialized View

  • Equality searches.

  • Range searches.

  • Sort operations.

Note: Performance can be improved only for the subset of rows and columns included in the materialized view.

You can also use materialized views to define different clustering keys on the same source table (or a subset of that table), or to store flattened JSON or variant data so it only needs to be flattened once.

Clustering the Table

  • Equality searches.

  • Range searches.

Note: A table can be clustered only on a single key, which can contain one or more columns or expressions.

The following table shows which of these three optimizations have storage or compute costs:

Storage Cost

Compute Cost

Search Optimization Service

Query Acceleration Service

Materialized View

Clustering the Table

1

1

The process of reclustering can increase the size of Fail-safe storage due to the rewriting of existing partitions into new partitions. (Note that this process does not introduce any new rows. This just reorganizes existing rows.) For details, see Credit and Storage Impact of Reclustering.

What Access Control Privileges Are Needed For the Search Optimization Service?

To add, configure, or remove search optimization for a table, you must have the following privileges:

  • You must have OWNERSHIP privilege on the table.

  • You must have ADD SEARCH OPTIMIZATION privilege on the schema that contains the table.

    GRANT ADD SEARCH OPTIMIZATION ON SCHEMA <schema_name> TO ROLE <role>
    
    Copy

To use the search optimization service for a query, you just need the SELECT privilege on the table.

You do not need any additional privileges. Because SEARCH OPTIMIZATION is a table property, it is automatically detected and used (if appropriate) when querying a table.

Identifying the Tables and Columns That Benefit From Search Optimization

Search optimization works best to improve the performance of a query when the table is frequently queried on columns other than the primary cluster key.

The search optimization service currently supports specific types of queries for the following data types:

  • Fixed-point numbers (e.g. INTEGER, NUMERIC).

  • DATE, TIME, and TIMESTAMP.

  • VARCHAR.

  • BINARY.

  • VARIANT, OBJECT, and ARRAY.

  • GEOGRAPHY.

Currently, the search optimization service does not support floating point data types, GEOMETRY, or other data types not listed above. Snowflake might add support for more data types in the future.

The search optimization service also does not support collations.

Identifying Queries That Benefit From Search Optimization

Search optimization works best to improve the performance of the following types of queries:

  • A query that typically runs for a few seconds or longer (before applying search optimization). In most cases, search optimization will not substantially improve the performance of a query that has a sub-second execution time.

  • A query in which at least one of the columns accessed through the query filter operation has at least 100,000 to 200,000 distinct values.

    To determine the number of distinct values, you can use either of the following:

    • Use APPROX_COUNT_DISTINCT to get the approximate number of distinct values:

      select approx_count_distinct(column1) from table1;
      
      Copy
    • Use COUNT(DISTINCT <col_name>) to get the actual number of distinct values:

      select count(distinct c1), count (distinct c2)  from test_table;
      
      Copy

    Because you need only an approximation of the number of distinct values, consider using APPROX_COUNT_DISTINCT, which is generally faster and cheaper than COUNT(DISTINCT <col_name>).

Search optimization can improve the performance of these types of queries

Equality or IN Predicates

The search optimization service can improve the performance of queries that use:

  • Equality predicates (for example, <column_name> = <constant>).

  • Predicates that use IN (see example).

Substrings and Regular Expressions

The search optimization service can improve the performance of queries with predicates that search for substrings or use regular expressions in text or semi-structured data. (For details on how substring searches work with semi-structured data, see Substring Search in VARIANT Types.)

This capability includes predicates that use:

Note

You must enable this feature for specific columns using the ON clause in the ALTER TABLE … ADD SEARCH OPTIMIZATION command. (Enabling search optimization at the table level does not optimize substring search.) For example:

ALTER TABLE mytable ADD SEARCH OPTIMIZATION ON SUBSTRING(mycol);
Copy

Substring searches are not optimized if you omit the ON clause.

The search optimization service can improve performance when searching for substrings that are 5 or more characters long. (More selective substrings can result in better performance.) The search optimization service does not use search access paths for the following predicate because the substring is shorter than 5 characters:

LIKE '%TEST%'
Copy

For the following predicate, the search optimization service can optimize this query, using search access paths to search for the substrings for SEARCH and OPTIMIZED. However, search access paths are not used for IS because the substring is shorter than 5 characters.

LIKE '%SEARCH%IS%OPTIMIZED%'
Copy

For queries that use RLIKE, REGEXP, and REGEXP_LIKE against text:

  • The subject argument must be a TEXT column in a table that has search optimization enabled.

  • The pattern argument must be a string constant.

For regular expressions, the search optimization service works best when:

  • The pattern contains at least one substring literal that is 5 or more characters long.

  • The pattern specifies that the substring should appear at least once.

For example, the following pattern specifies that string should appear one or more times in the subject:

RLIKE '(string)+'
Copy

The search optimization service can improve the performance of queries with the following patterns because each predicate specifies that a substring of 5 or more characters must appear at least once. (Note that the first example uses a dollar-quoted string constant to avoid escaping the backslash characters.)

RLIKE $$.*email=[\w\.]+@snowflake\.com.*$$
Copy
RLIKE '.*country=(Germany|France|Spain).*'
Copy
RLIKE '.*phone=[0-9]{3}-?[0-9]{3}-?[0-9]{4}.*'
Copy

In contrast, the search optimization service does not use search access paths for queries with the following patterns:

  • Patterns without any substrings:

    RLIKE '.*[0-9]{3}-?[0-9]{3}-?[0-9]{4}.*'
    
    Copy
  • Patterns that only contain substrings shorter than 5 characters:

    RLIKE '.*tel=[0-9]{3}-?[0-9]{3}-?[0-9]{4}.*'
    
    Copy
  • Patterns that use the alternation operator where one option is a substring shorter than 5 characters:

    RLIKE '.*(option1|option2|opt3).*'
    
    Copy
  • Patterns in which the substring is optional:

    RLIKE '.*[a-zA-z]+(string)?[0-9]+.*'
    
    Copy

Even when the substring literals are shorter than 5 characters, the search optimization service can still improve query performance if expanding the regular expression produces a substring literal that is 5 characters or longer.

For example, consider the pattern:

.*st=(CA|AZ|NV).*(-->){2,4}.*
Copy

In this example:

  • Although the substring literals (e.g. st=, CA, etc) are shorter than 5 characters, the search optimization service recognizes that the substring st=CA, st=AZ, or st=NV (each of which is 5 characters long) must appear in the text.

  • Similarly, even though the substring literal --> is shorter than 5 characters, the search optimization service determines that the substring -->--> (which is longer than 5 characters) must appear in the text.

The search optimization service can use search access paths to match these substrings, which can improve the performance of the query.

Fields in VARIANT Columns

The search optimization service can improve the performance of point lookup and substring queries on semi-structured data in Snowflake tables (data in VARIANT, OBJECT, and ARRAY columns).

When VARIANT support for the search optimization service is configured for columns in a table, the search optimization service automatically includes VARIANT, OBJECT, and ARRAY columns in a search access path. This even applies to columns where the structure is deeply nested and the structure changes frequently. You may also enable search optimization for specific fields within a semi-structured column.

Note

You must explicitly enable this feature for specific columns or fields in columns using the ON clause in the ALTER TABLE … ADD SEARCH OPTIMIZATION command. (Enabling search optimization at the table level does not enable it for VARIANT columns.) For example:

ALTER TABLE mytable ADD SEARCH OPTIMIZATION ON EQUALITY(myvariantcol);
ALTER TABLE t1 ADD SEARCH OPTIMIZATION ON EQUALITY(c4:user.uuid);

ALTER TABLE mytable ADD SEARCH OPTIMIZATION ON SUBSTRING(myvariantcol);
ALTER TABLE t1 ADD SEARCH OPTIMIZATION ON SUBSTRING(c4:user.uuid);
Copy

Queries against a VARIANT column are not optimized if you omit the ON clause.

The next sections provide more details about this support:

Supported Data Types for Constants and Casts in Predicates for VARIANT Types

The search optimization service can improve the performance of point lookups of semi-structured data where the following types are used for the constant and the implicit or explicit cast for the element:

  • FIXED (including casts that specify a valid precision and scale)

  • INTEGER

  • TEXT

  • DATE (including casts that specify a scale)

  • TIME (including casts that specify a scale)

  • TIMESTAMP, TIMESTAMP_LTZ, TIMESTAMP_NTZ, TIMESTAMP_TZ (including casts that specify a scale)

The search optimization service supports the casting of types using:

Support for VARIANT Values Cast as TEXT

The search optimization service can also improve the performance of point lookups in which VARIANT columns are cast to TEXT and are compared to constants that are cast to TEXT.

For example, suppose that src is a VARIANT column containing boolean, date, and time values that have been converted to VARIANT:

create or replace TABLE test_table
(
    ID INTEGER,
    SRC VARIANT
);

insert into test_table select 1, to_variant('true'::boolean);          -- BOOLEAN
insert into test_table select 2, to_variant('2020-01-09'::date);       -- DATE
insert into test_table select 3, to_variant('01:02:03.899213'::time);  -- TIME
Copy

For this table, the search optimization service can improve the following queries, which cast the VARIANT column to TEXT and compare the column to string constants:

select * from test_table where src::TEXT = 'true';
select * from test_table where src::TEXT = '2020-01-09';
select * from test_table where src::TEXT = '01:02:03.899213';
Copy

Supported Predicates for Point Lookups on VARIANT Types

The search optimization service can improve point lookup queries with the types of predicates listed below. In the examples below, src is the VARIANT column, and path_to_variant_field is a path to a field in the VARIANT column.

  • Equality predicates of the following form:

    where path_to_variant_field[::target_data_type] = constant


    target_data_type (if specified) and the data type of constant must be one of the supported types listed above.

    Note that :: is just an example of one of the supported ways of casting the value to a specific type.

    For example, the search optimization service supports:

    • Matching an element against a NUMBER constant without explicitly casting the element.

      where src:person.age = 42;
      
      Copy
    • Explicitly casting an element to NUMBER with a specified precision and scale.

      where src:location.temperature::NUMBER(8, 6) = 23.456789;
      
      Copy
    • Matching an element against a TEXT constant without explicitly casting the element.

      where src:sender_info.ip_address = '123.123.123.123';
      
      Copy
    • Explicitly casting an element to TEXT.

      where src:salesperson.name::TEXT = 'John Appleseed';
      
      Copy
    • Explicitly casting an element to DATE.

      where src:events.date::DATE = '2021-03-26';
      
      Copy
    • Explicitly casting an element to TIME with a specified scale.

      where src:events.time_info::TIME(6) = '01:02:03.456789';
      
      Copy
    • Explicitly casting an element to TIMESTAMP with a specified scale.

      where src:event_logs.exceptions.timestamp_info(3) = '2021-03-26 15:00:00.123 -0800';
      
      Copy
  • Predicates that use the ARRAY functions, such as:


    • where ARRAY_CONTAINS(constant::VARIANT, path_to_variant_field)

      constant must not be NULL, and the data type of constant must be one of the supported types listed above.

      Note that :: is just an example of one of the supported ways of casting the value to a specific type.

      For example:

      where ARRAY_CONTAINS('77.146.211.88'::VARIANT, src:logs.ip_addresses)
      
      Copy
    • where ARRAYS_OVERLAP(ARRAY_CONSTRUCT(constant_1, constant_2, .., constant_N), path_to_variant_field)

      The data type of each constant (constant_1, constant_2, etc.) must be one of the supported types listed above. The constructed ARRAY can include NULL constants.

      For example:

      where ARRAYS_OVERLAP(
          ARRAY_CONSTRUCT('122.63.45.75', '89.206.83.107'), src:senders.ip_addresses)
      
      Copy
  • The following predicates that check for NULL values:

    • where IS_NULL_VALUE(path_to_variant_field)

      Note that IS_NULL_VALUE applies to JSON null values and not to SQL NULL values.

    • where path_to_variant_field IS NOT NULL

    • where variant_column IS NULL

      where variant_column refers to the column and not a path to an element in the semi-structured data.

      For example, the search optimization service supports using the VARIANT column src but not the path to the field src:person.age in that VARIANT column.

Substring Search in VARIANT Types

The search optimization service can optimize wildcard or regular expression searches in semi-structured columns (including ARRAY, OBJECT, and VARIANT columns) or fields in such columns. This includes predicates that use:

You can enable substring search optimization for a column or for multiple individual fields within a column. For example, the following statement enables substring search optimization for a nested field in a column.

ALTER TABLE test_table ADD SEARCH OPTIMIZATION ON SUBSTRING(col2:data.search);
Copy

Once the search access path has been built, the following query can be optimized:

SELECT * FROM test_table WHERE col2:data.search LIKE '%optimization%';
Copy

However, the following queries are not optimized because they are not on the field specified when enabling search optimization.

SELECT * FROM test_table WHERE col2:name LIKE '%john%doe%';
SELECT * FROM test_table WHERE col2 LIKE '%hello%world%';
Copy

You can specify multiple fields to be optimized. Here, search optimization is enabled for two specific fields in the column col2.

ALTER TABLE test_table ADD SEARCH OPTIMIZATION ON SUBSTRING(col2:name);
ALTER TABLE test_table ADD SEARCH OPTIMIZATION ON SUBSTRING(col2:data.search);
Copy

If you enable search optimization for a given field, it is enabled for any subfields. The second ALTER TABLE statement below is redundant because the first statement enables search optimization for the entire data field, including the nested search field.

ALTER TABLE test_table ADD SEARCH OPTIMIZATION ON SUBSTRING(col2:data);
ALTER TABLE test_table ADD SEARCH OPTIMIZATION ON SUBSTRING(col2:data.search);
Copy

Similarly, enabling search optimization for an entire column allows all substring searches on that column to be optimized, including fields nested to any depth within it.

How Constants Are Evaluated for VARIANT Substring Searches

When evaluating the constant string in a query (e.g. LIKE 'constant_string'), the search optimization service splits the string into tokens using the following characters as delimiters:

  • Square brackets ([ and ]).

  • Curly braces ({ and }).

  • Colons (:).

  • Commas (,).

  • Double quotes (").

After splitting the string into tokens, the search optimization service considers only tokens that are at least 5 characters long.

Example of a Predicate

How the Search Optimization Service Handles the Query

LIKE '%TEST%'

The search optimization service does not use search access paths for the following predicate because the substring is shorter than 5 characters.

LIKE '%SEARCH%IS%OPTIMIZED%'

The search optimization service can optimize this query, using search access paths to search for SEARCH and OPTIMIZED but not IS. IS is shorter than 5 characters.

LIKE '%HELLO_WORLD%'

The search optimization service can optimize this query, using search access paths to search for HELLO_WORLD.

LIKE '%COL:ON:S:EVE:RYWH:ERE%'

The search optimization service splits this string into COL, ON, S, EVE, RYWH, ERE. Because all of these tokens are shorter than 5 characters, the search optimization service cannot optimize this query.

LIKE '%{\"KEY01\":{\"KEY02\":\"value\"}%'

The search optimization service splits this into the tokens KEY01, KEY02, VALUE and uses the tokens when optimizing the query.

LIKE '%quo\"tes_and_com,mas,\"are_n\"ot\"_all,owed%'

The search optimization service splits this string into the tokens quo, tes_and_com, mas, are_n, ot, _all, owed. The search optimization service can only use the tokens that are 5 characters or longer (tes_and_com, are_n) when optimizing the query.

Current Limitations in Support for VARIANT Types

Currently, support for VARIANT types in the search optimization service has the following limitations:

  • Predicates that use XMLGET are not supported.

  • Predicates of the form variant_field IS NULL are not supported.

  • Predicates where the constants are results of scalar subqueries are not supported.

  • Predicates that specify paths to elements that contain sub-elements are not supported.

The current limitations of the search optimization service also apply to this feature.

Geospatial Functions

The search optimization service can improve the performance of queries with predicates that use geospatial functions with GEOGRAPHY objects.

Note

You must enable this feature for specific columns using the ON clause in the ALTER TABLE … ADD SEARCH OPTIMIZATION command. For example:

ALTER TABLE mytable ADD SEARCH OPTIMIZATION ON GEO(mygeocol);
Copy

Queries that use geospatial functions are not improved if you omit the ON clause.

The following sections provide more details:

Note

GEOMETRY objects are not yet supported.

Supported Predicates With Geospatial Functions

For queries with predicates that use the following functions:

The search optimization service can improve performance if:

  • One input expression is a GEOGRAPHY column in a table, and

  • The other input expression is a GEOGRAPHY constant (created through a conversion or constructor function).

  • For ST_DWITHIN, the distance argument is a non-negative REAL constant.

Note that this feature has the same limitations that apply to the search optimization service.

Other Performance Considerations

Because the search optimization service is designed for predicates that are highly selective and because predicates filter by proximity between geospatial objects, clustering geospatial objects by proximity in the table can result in better performance. You can cluster your data either by specifying the sort order when loading the data or by using Automatic Clustering, depending on whether the base table changes frequently:

Loading Pre-Sorted Data

If the data in your base table does not change often, you can specify the sort order when loading the data. You can then enable search optimization on the GEOGRAPHY column. For example:

CREATE TABLE new_table AS SELECT * FROM source_table ORDER BY st_geohash(geom);
ALTER TABLE new_table ADD SEARCH OPTIMIZATION ON GEO(geom);
Copy

After every large change made to your base data, you can manually re-sort the data.

Automatic Clustering

If there are frequent updates to your base table, you can use the ALTER TABLE … CLUSTER BY … command to enable Automatic Clustering so the table is automatically reclustered as it changes.

The following example adds a new column geom_geohash of the type VARCHAR and stores the geohash or H3 index of the GEOGRAPHY column geom in that new column. It then enables Automatic Clustering with the new column as the cluster key. This approach will automatically recluster the parts of the table that change.

CREATE TABLE new_table AS SELECT *, ST_GEOHASH(geom) AS geom_geohash FROM source_table;
ALTER TABLE new_table CLUSTER BY (geom_geohash);
ALTER TABLE new_table ADD SEARCH OPTIMIZATION ON GEO(geom);
Copy

Examples That Use Geospatial Functions

The following statements create and configure the table used in the examples in this section. The last statement uses the ON clause in ALTER TABLE … ADD SEARCH OPTIMIZATION command to add search optimization for the g1 GEOGRAPHY column.

CREATE OR REPLACE TABLE geospatial_table (id NUMBER, g1 GEOGRAPHY);
INSERT INTO geospatial_table VALUES
  (1, 'POINT(-122.35 37.55)'),
  (2, 'LINESTRING(-124.20 42.00, -120.01 41.99)'),
  (3, 'POLYGON((0 0, 2 0, 2 2, 0 2, 0 0))');
ALTER TABLE geospatial_table ADD SEARCH OPTIMIZATION ON GEO(g1);
Copy
Examples of Supported Predicates

The following query is an example of a query supported by the search optimization service. The search optimization service can use search access paths to improve the performance of this query:

SELECT id FROM geospatial_table WHERE
  ST_INTERSECTS(
    g1,
    TO_GEOGRAPHY('POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))'));
Copy

The following are examples of additional predicates that are supported by the search optimization service:

...
  ST_INTERSECTS(
    TO_GEOGRAPHY('POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))'),
    g1)
Copy
...
  ST_CONTAINS(
    TO_GEOGRAPHY('POLYGON((-74.17 40.64, -74.1796875 40.58, -74.09 40.58, -74.09 40.64, -74.17 40.64))'),
    g1)
Copy
...
  ST_CONTAINS(
    g1,
    TO_GEOGRAPHY('MULTIPOINT((0 0), (1 1))'))
Copy
...
  ST_WITHIN(
   TO_GEOGRAPHY('{"type" : "MultiPoint","coordinates" : [[-122.30, 37.55], [-122.20, 47.61]]}'),
   g1)
Copy
...
  ST_WITHIN(
    g1,
    TO_GEOGRAPHY('POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))'))
Copy
...
  ST_COVERS(
    TO_GEOGRAPHY('POLYGON((-1 -1, -1 4, 4 4, 4 -1, -1 -1))'),
    g1)
Copy
...
  ST_COVERS(
    g1,
    TO_GEOGRAPHY('POINT(0 0)'))
Copy
...
  ST_COVEREDBY(
    TO_GEOGRAPHY('POLYGON((1 1, 2 1, 2 2, 1 2, 1 1))'),
    g1)
Copy
...
  ST_COVEREDBY(
    g1,
    TO_GEOGRAPHY('POINT(-122.35 37.55)'))
Copy
...
  ST_DWITHIN(
    TO_GEOGRAPHY('POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))'),
    g1,
    100000)
Copy
...
  ST_DWITHIN(
    g1,
    TO_GEOGRAPHY('POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))'),
    100000)
Copy
Examples of Constructing GEOGRAPHY Constants

The following are examples of predicates that use different conversion and constructor functions for the GEOGRAPHY constant.

...
  ST_INTERSECTS(
    g1,
    ST_GEOGRAPHYFROMWKT('POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))'))
Copy
...
  ST_INTERSECTS(
    ST_GEOGFROMTEXT('POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))'),
    g1)
Copy
...
  ST_CONTAINS(
    ST_GEOGRAPHYFROMEWKT('POLYGON((-74.17 40.64, -74.1796875 40.58, -74.09 40.58, -74.09 40.64, -74.17 40.64))'),
    g1)
Copy
...
  ST_WITHIN(
    ST_GEOGRAPHYFROMWKB('01010000006666666666965EC06666666666C64240'),
    g1)
Copy
...
  ST_COVERS(
    g1,
    ST_MAKEPOINT(0.2, 0.8))
Copy
...
  ST_INTERSECTS(
    g1,
    ST_MAKELINE(
      TO_GEOGRAPHY('MULTIPOINT((0 0), (1 1))'),
      TO_GEOGRAPHY('POINT(0.8 0.2)')))
Copy
...
  ST_INTERSECTS(
    ST_POLYGON(
      TO_GEOGRAPHY('SRID=4326;LINESTRING(0.0 0.0, 1.0 0.0, 1.0 2.0, 0.0 2.0, 0.0 0.0)')),
    g1)
Copy
...
  ST_WITHIN(
    g1,
    TRY_TO_GEOGRAPHY('POLYGON((-1 -1, -1 4, 4 4, 4 -1, -1 -1))'))
Copy
...
  ST_COVERS(
    g1,
    ST_GEOGPOINTFROMGEOHASH('s00'))
Copy

Conjunctions of Supported Predicates (AND)

For queries that use conjunctions of predicates (i.e., AND), query performance can be improved by search optimization if any of the predicates adhere to the conditions above.

For example, suppose that a query has:

where condition_x and condition_y

Search optimization can improve performance if either condition separately returns a few rows (i.e., condition_x returns a few rows or condition_y returns a few rows).

If condition_x returns a few rows but condition_y returns many rows, the query performance can benefit from search optimization.

See these additional examples.

Disjunctions of Supported Predicates (OR)

For queries that use disjunctions of predicates (i.e., OR), query performance can be improved by search optimization if all predicates adhere to the conditions above.

For example, suppose that a query has:

where condition_x or condition_y

Search optimization can improve performance if each condition separately returns a few rows (i.e., condition_x returns a few rows and condition_y returns a few rows).

If condition_x returns a few rows but condition_y returns many rows, the query performance does not benefit from search optimization.

In the case of disjunctions, each predicate in isolation is not decisive in the query. The other predicates need to be evaluated before it can be determined if search optimization can improve performance.

Joins

The search optimization service does not directly improve the performance of joins. However, it can improve the performance of filtering rows from either table prior to the join, if the table has search optimization enabled and the predicate is selective.

Both tables do not need to have search optimization enabled. The decision to use search optimization is made for each table independently.

Views

The search optimization service can indirectly improve the performance of views (including secure views). If a base table for a view has search optimization enabled and if the query uses a selective predicate for that table, the search optimization service can improve performance when filtering rows.

All tables in the view do not need to have search optimization enabled. Search optimization is performed on each table independently.

Tables with Masking Policies and Row Access Policies

The search optimization service can improve query performance for tables that use masking policies and row access policies.

Note

When search optimization is enabled, a user who is prevented from seeing a value due to a masking policy or row access policy might be able to deduce with greater certainty whether that value exists. With or without search optimization, differences in query latency can provide hints about the presence or absence of data restricted by a policy, but this effect can be magnified by search optimization in some situations.

For example, suppose that a row access policy prevents a user from accessing rows with country = US, but the data does not include rows with country = US. Now suppose that search optimization is enabled for the country column and that the user runs a query with WHERE country = US. The query returns empty results as expected, but the query might run faster with search optimization than without. In this case, the user can infer that the data does not contain a row where country = US based on the time taken to run the query.

Queries Not Supported By the Search Optimization Service

The search optimization service does not support the following:

  • External tables.

  • Materialized views.

  • Columns defined with a COLLATE clause.

  • Column concatenation.

  • Analytical expressions.

  • Casts on table columns (except for fixed-point numbers cast to strings).

    Although search optimization supports predicates with implicit and explicit casts on constant values, it does not support predicates that cast values in the actual table column (except for casts from INTEGER and NUMBER to VARCHAR).

    For example, the following predicates are supported because they use implicit and explicit casts on constant values (not values in the table column):

    -- Supported predicate
    -- (where the string '2020-01-01' is implicitly cast to a date)
    WHERE timestamp1 = '2020-01-01';
    
    -- Supported predicate
    -- (where the string '2020-01-01' is explicitly cast to a date)
    WHERE timestamp1 = '2020-01-01'::date;
    
    Copy

    The following predicate is not supported because it uses a cast on values in the table column:

    -- Unsupported predicate
    -- (where values in a VARCHAR column are cast to DATE)
    WHERE to_date(varchar_column) = '2020-01-01';
    
    Copy

    The search optimization service considers the original column values, not the values after the cast. As a result, the search optimization service is not used for queries with these predicates.

As mentioned earlier, the exception to this rule is casting NUMBER or INTEGER values to VARCHAR values in the table column. The search optimization service does support this type of predicate:

-- Supported predicate
-- (where values in a numeric column are cast to a string)
WHERE cast(numeric_column as varchar) = '2'
Copy

Search optimization does not improve performance of queries that use Time Travel because search optimization works only on active data.

Configuring Search Optimization for a Table

Note

Adding search optimization to a large table (a table containing terabytes (TB) or more of data) might result in an immediate increase in credit consumption over a short period of time.

When you add search optimization to a table, the maintenance service immediately starts building the search access paths for the table in the background. If the table is large, the maintenance service might massively parallelize this work, which can result in increased costs over a short period of time.

Before you add search optimization to a large table, get an estimate of these costs so that you know what to expect.

To add search optimization to a table, you use the ALTER TABLEADD SEARCH OPTIMIZATION command with or without the ON clause.

Using the command without the ON clause sets up search access paths to improve the performance of queries using the EQUALITY search method for all columns that use the supported data types.

If you want to improve performance for other types of queries or if you need more control over which columns are configured for search optimization, use the ON clause in the ALTER TABLE … ADD SEARCH OPTIMIZATION command.

In the ON clause in ADD SEARCH OPTIMIZATION, you specify which columns should be enabled for search optimization. When enabling search optimization for a given column, you can also specify a search method (e.g. EQUALITY for equality and IN searches, GEO for GEOGRAPHY searches, and SUBSTRING for substring searches).

To manage the cost of search optimization, you can remove search optimization from specific columns where search optimization is not needed.

The next sections explain how to configure search optimization for a table:

Configuring Search Optimization for Specific Columns

To configure search optimization for a specific column, use the ALTER TABLEADD SEARCH OPTIMIZATION command with the ON clause.

Note

When running this command, use a role that has the privileges to add search optimization to the table.

The ON clause specifies that you want to configure search optimization for specific columns. For details on the syntax, see the section on ALTER TABLE … ADD SEARCH OPTIMIZATION.

Note

If you just want to apply search optimization for equality and IN predicates to all applicable columns in the table, see Adding and Maintaining Search Optimization for the Entire Table.

After running this command, you can verify that the columns have been configured for search optimization.

The next sections contain examples that demonstrate how to specify the configuration for search optimization:

Example: Supporting Equality and IN Predicates for Specific Columns

To optimize searches with equality predicates for the columns c1, c2, and c3 in the table t1, execute the following statement:

ALTER TABLE t1 ADD SEARCH OPTIMIZATION ON EQUALITY(c1, c2, c3);
Copy

You can also specify the same search method more than once in the ON clause:

-- This statement is equivalent to the previous statement.
ALTER TABLE t1 ADD SEARCH OPTIMIZATION ON EQUALITY(c1), EQUALITY(c2, c3);
Copy

Example: Supporting Equality and IN Predicates for All Applicable Columns

To optimize searches with equality predicates for all applicable columns in the table, execute the following statement:

ALTER TABLE t1 ADD SEARCH OPTIMIZATION ON EQUALITY(*);
Copy

Note the following:

Example: Supporting Different Types of Predicates

To optimize searches with equality predicates for the column c1 and c2 and substring searches for the column c3, execute the following statement:

ALTER TABLE t1 ADD SEARCH OPTIMIZATION ON EQUALITY(c1, c2), SUBSTRING(c3);
Copy

Example: Supporting Equality and IN Predicates for a Field in a VARIANT

To optimize searches with equality predicates on the VARIANT field uuid nested in the field user in the VARIANT column c4, execute the following statement:

ALTER TABLE t1 ADD SEARCH OPTIMIZATION ON EQUALITY(c4:user.uuid);
Copy

Example: Supporting Geospatial Functions

To optimize searches with predicates that use geospatial functions with GEOGRAPHY objects in the c1 column, execute the following statement:

ALTER TABLE t1 ADD SEARCH OPTIMIZATION ON GEO(c1);
Copy

Adding and Maintaining Search Optimization for the Entire Table

If you just want to specify EQUALITY for all columns of the supported data types (except for VARIANT and GEOGRAPHY), use the ALTER TABLEADD SEARCH OPTIMIZATION command without the ON clause.

Note

When running this command, use a role that has the privileges to add search optimization to the table.

For example:

alter table test_table add search optimization;
Copy

For more information on the syntax, see the section on search optimization in ALTER TABLE.

After running this command, you can verify that the columns have been configured for search optimization.

Effect on Subsequently Added Columns

After you run ALTER TABLE … ADD SEARCH OPTIMIZATION command without the ON clause, any columns that are subsequently added to the table will also be configured for EQUALITY.

However, if you execute ALTER TABLE … { ADD | DROP } SEARCH OPTIMIZATION with the ON clause on the same table, any columns that are subsequently added to the table will not be configured for EQUALITY automatically. You must execute ALTER TABLE … ADD SEARCH OPTIMIZATION ON … to configure these newly added columns for EQUALITY.

Verifying That the Table Is Configured For Search Optimization

To verify that the table and its columns have been configured for search optimization:

  1. Display the search optimization configuration for the table and its columns.

  2. Run the SHOW TABLES command to verify that search optimization has been added and to determine how much of the table has been optimized.

    For example:

    SHOW TABLES LIKE '%test_table%';
    
    Copy

    In the output from this command:

    • Verify that SEARCH_OPTIMIZATION is ON, which indicates that search optimization has been added.

    • Check the value of SEARCH_OPTIMIZATION_PROGRESS. This specifies the percentage of the table that has been optimized so far.

      When search optimization is first added to a table, the performance benefits do not appear immediately. The search optimization service starts populating data in the background. The benefits appear increasingly as the maintenance catches up to the current state of the table.

      Before you run a query to verify that search optimization is working, wait until this shows that the table has been fully optimized.

  3. Run a query to verify that search optimization is working.

    Note that the Snowflake optimizer automatically chooses when to use the search optimization service for a particular query. Users cannot control which queries search optimization is used for.

    Choose a query that the search optimization service is designed to optimize. See Identifying the Tables and Columns That Benefit From Search Optimization.

  4. In the web UI, view the query plan for this query, and verify that the query node “Search Optimization Access” is part of the query plan.

Displaying the Search Optimization Configuration for a Table

To display the search optimization configuration for a table, use the DESCRIBE SEARCH OPTIMIZATION command.

For example, suppose that you execute the following statement to configure search optimization for a column:

ALTER TABLE t1 ADD SEARCH OPTIMIZATION ON EQUALITY(c1);
Copy

Executing DESCRIBE SEARCH OPTIMIZATION produces the following output:

DESCRIBE SEARCH OPTIMIZATION ON t1;

+---------------+----------+--------+------------------+--------+
| expression_id |  method  | target | target_data_type | active |
+---------------+----------+--------+------------------+--------+
| 1             | EQUALITY | C1     | NUMBER(38,0)     | true   |
+---------------+----------+--------+------------------+--------+
Copy

Managing the Costs of the Search Optimization Service

The search optimization service impacts costs for both storage and compute resources:

  • Storage resources: The search optimization service creates a search access path data structure that requires space for each table on which search optimization is enabled. The storage cost of the search access path depends upon multiple factors, including:

    • The number of distinct values (NDVs) in the table. In the extreme case where all columns have data types that use the search access path, and all data values in each column are unique, the required storage can be as much as the original table’s size.

      Typically, however, the size is approximately 1/4 of the original table’s size.

  • Compute resources:

    • Adding search optimization to a table consumes resources during the initial build phase.

    • Maintaining the search optimization service also requires resources. Resource consumption is higher when there is high churn (i.e. when large volumes of data in the table change). These costs are roughly proportional to the amount of data ingested (added or changed). Deletes also have some cost.

      Automatic clustering, while improving the latency of queries in tables with search optimization, can further increase the maintenance costs of search optimization. If a table has a high churn rate, enabling automatic clustering and configuring search optimization for the table can result in higher maintenance costs than if the table is just configured for search optimization.

      Snowflake ensures efficient credit usage by billing your account only for the actual resources used. Billing is calculated in 1-second increments.

      See the “Serverless Feature Credit Table” in the Snowflake service consumption table for the costs per compute hour.

      Once you enable the search optimization service, you can view the costs for your use of the service.

Tip

Snowflake recommends starting slowly with this feature (i.e. adding search optimization to only a few tables at first) and closely monitoring the costs and benefits.

Estimating the Costs

To estimate the cost of adding search optimization to a table and configuring specific columns for search optimization, use the SYSTEM$ESTIMATE_SEARCH_OPTIMIZATION_COSTS function.

In general, the costs are proportional to:

  • The number of columns on which the feature is enabled and the number of distinct values in those columns.

  • The amount of data that changes in these tables.

Important

Cost estimates returned by the SYSTEM$ESTIMATE_SEARCH_OPTIMIZATION_COSTS function are best efforts. The actual realized costs can vary by up to 50% (or, in rare cases, by several times) from the estimated costs.

  • Build and storage cost estimates are based on sampling a subset of the rows in the table

  • Maintenance cost estimates are based on recent create, delete, and update activity in the table

Viewing the Costs

You can view the billing costs for the search optimization service by using either the web interface or SQL. See Exploring Compute Cost.

Reducing the Costs

You can control the cost of the search optimization service by carefully choosing the tables and columns for which to enable search optimization.

In addition, to reduce the cost of the search optimization service:

  • Snowflake recommends batching DML operations on the table:

    • DELETE: If tables store data for the most recent time period (e.g. the most recent day or week or month), then when you trim your table by deleting old data, the search optimization service must take into account the updates. In some cases, you might be able to reduce costs by deleting less frequently (e.g. daily rather than hourly).

    • INSERT, UPDATE, and MERGE: Batching these types of DML statements on the table can reduce the cost of maintenance by the search optimization service.

  • If you recluster the entire table, consider dropping the SEARCH OPTIMIZATION property for that table before reclustering, and then add the SEARCH OPTIMIZATION property back to the table after reclustering.

  • Before enabling search optimization for substring searches (ON SUBSTRING(col)) or VARIANTs (ON EQUALITY(variant_col)), call SYSTEM$ESTIMATE_SEARCH_OPTIMIZATION_COSTS to estimate the costs. The initial build and maintenance for these search methods can be computationally intensive, so you should assess the trade-off between performance and cost.

Removing Search Optimization From Specific Columns or the Entire Table

You can remove the search optimization configuration for specific columns, or you can remove the SEARCH OPTIMIZATION property from the entire table.

Dropping Search Optimization for Specific Columns

To drop the search optimization configuration for specific columns, use the following command: ALTER TABLEDROP SEARCH OPTIMIZATION command with the ON clause.

For example, suppose that executing the DESCRIBE SEARCH OPTIMIZATION command prints the following expressions:

DESCRIBE SEARCH OPTIMIZATION ON t1;

+---------------+-----------+-----------+-------------------+--------+
| expression_id |  method   | target    | target_data_type  | active |
+---------------+-----------+-----------+-------------------+--------+
|             1 | EQUALITY  | C1        | NUMBER(38,0)      | true   |
|             2 | EQUALITY  | C2        | VARCHAR(16777216) | true   |
|             3 | EQUALITY  | C4        | NUMBER(38,0)      | true   |
|             4 | EQUALITY  | C5        | VARCHAR(16777216) | true   |
|             5 | EQUALITY  | V1        | VARIANT           | true   |
|             6 | SUBSTRING | C2        | VARCHAR(16777216) | true   |
|             7 | SUBSTRING | C5        | VARCHAR(16777216) | true   |
|             8 | GEO       | G1        | GEOGRAPHY         | true   |
|             9 | EQUALITY  | V1:"key1" | VARIANT           | true   |
|            10 | EQUALITY  | V1:"key2" | VARIANT           | true   |
+---------------+-----------+-----------+-------------------+--------+
Copy

To drop search optimization for substrings on the column c2, execute the following statement:

ALTER TABLE t1 DROP SEARCH OPTIMIZATION ON SUBSTRING(c2);
Copy

To drop search optimization for all methods on the column c5, execute the following statement:

ALTER TABLE t1 DROP SEARCH OPTIMIZATION ON c5;
Copy

Because the column c5 is configured to optimize equality and substring searches, the statement above drops the configuration for equality and substring searches for c5.

To drop search optimization for equality on the column c1 and to drop the configuration specified by the expression IDs 6 and 8, execute the following statement:

ALTER TABLE t1 DROP SEARCH OPTIMIZATION ON EQUALITY(c1), 6, 8;
Copy

For more information on the syntax, see the section on ALTER TABLE … DROP SEARCH OPTIMIZATION.

Removing Search Optimization From the Table

To remove the SEARCH OPTIMIZATION property from a table:

  1. Switch to a role that has the privileges to remove search optimization from the table.

  2. Run the ALTER TABLEDROP SEARCH OPTIMIZATION command without the ON clause:

    ALTER TABLE [IF EXISTS] <table_name> DROP SEARCH OPTIMIZATION;
    
    Copy

    For example:

    alter table test_table drop search optimization;
    
    Copy

    For more information, see the section on ALTER TABLE … DROP SEARCH OPTIMIZATION.

Examples

The following code shows creation and use of a search optimization service.

Start by creating a table with data:

create or replace table test_table (id int, c1 int, c2 string, c3 date) as
select * from values
  (1, 3, '4',  '1985-05-11'),
  (2, 4, '3',  '1996-12-20'),
  (3, 2, '1',  '1974-02-03'),
  (4, 1, '2',  '2004-03-09'),
  (5, null, null,  null);
Copy

Add the SEARCH OPTIMIZATION property to the table:

alter table test_table add search optimization;
Copy

The following queries can use the search optimization service:

select * from test_table where id = 2;
Copy
select * from test_table where c2 = '1';
Copy
select * from test_table where c3 = '1985-05-11';
Copy
select * from test_table where c1 is null;
Copy
select * from test_table where c1 = 4 and c3 = '1996-12-20';
Copy

The following query can use the search optimization service because the implicit cast is on the constant, not the column:

select * from test_table where c2 = 2;
Copy

The following cannot use the search optimization service because the cast is on the table’s column:

select * from test_table where cast(c2 as number) = 2;
Copy

An IN clause is supported by the search optimization service:

select id, c1, c2, c3
    from test_table
    where id IN (2, 3)
    order by id;
Copy

If predicates are individually supported by the search optimization service, then they can be joined by the conjunction AND and still be supported by the search optimization service:

select id, c1, c2, c3
    from test_table
    where c1 = 1
       and
          c3 = TO_DATE('2004-03-09')
    order by id;
Copy

DELETE and UPDATE (and MERGE) can also use the search optimization service:

delete from test_table where id = 3;
Copy
update test_table set c1 = 99 where id = 4;
Copy

Compatibility with Query Acceleration

Search optimization and query acceleration can work together to optimize query performance. First, search optimization can prune the micro-partitions not needed for a query. Then, for eligible queries, query acceleration can offload portions of the rest of the work to shared compute resources provided by the service.

Performance of queries accelerated by both services varies depending on workload and available resources.