Speeding up queries of structured data with search optimization¶
The search optimization service can improve the performance of point-lookup and substring queries on structured data in Snowflake tables; that is, data in structured ARRAY, OBJECT, and MAP columns. You can configure search optimization on columns of these types even when the structure is deeply nested and changes frequently. You can also enable search optimization for specific elements within a structured column.
The following sections provide more information about search optimization support for queries of structured data:
Enabling search optimization for queries of structured data¶
To improve the performance for queries of structured data types on a table, use the ON clause in the ALTER TABLE … ADD SEARCH OPTIMIZATION command for specific columns or elements in columns. Queries against structured ARRAY, OBJECT, and MAP columns aren’t optimized if you omit the ON clause. Enabling search optimization at the table level doesn’t enable it for columns with structured data types.
For example:
ALTER TABLE mytable ADD SEARCH OPTIMIZATION ON SUBSTRING(array_column);
ALTER TABLE mytable ADD SEARCH OPTIMIZATION ON EQUALITY(array_column[1]);
ALTER TABLE mytable ADD SEARCH OPTIMIZATION ON EQUALITY(object_column);
ALTER TABLE mytable ADD SEARCH OPTIMIZATION ON SUBSTRING(object_column:key);
ALTER TABLE mytable ADD SEARCH OPTIMIZATION ON EQUALITY(map_column);
ALTER TABLE mytable ADD SEARCH OPTIMIZATION ON EQUALITY(map_column:user.uuid);
The following rules apply to the keywords you use in these ALTER TABLE … ADD SEARCH OPTIMIZATION commands:
You can use the EQUALITY keyword with any inner element or the column itself.
You can use the SUBSTRING keyword only with inner elements that have text string data types.
For more information, see Enabling and disabling search optimization.
Supported data types for constants and casts in predicates for structured types¶
The search optimization service can improve the performance of point lookups of structured data where the following types are used for the constant and the implicit or explicit cast for the element:
FIXED (including casts that specify a valid precision and scale)
INTEGER (including synonymous types)
VARCHAR (including synonymous types)
DATE (including casts that specify a scale)
TIME (including casts that specify a scale)
TIMESTAMP, TIMESTAMP_LTZ, TIMESTAMP_NTZ, TIMESTAMP_TZ (including casts that specify a scale)
The search optimization service supports the casting of types by using the following conversion functions:
Supported predicates for point lookups on structured types¶
The search optimization service can improve point-lookup queries with the types of predicates shown in the following
list. In the examples, src is the column with a structured data type, and path_to_element is a
path to an element in the column with a structured data type:
Equality predicates of the following form:
WHERE path_to_element[::target_data_type] = constantIn this syntax,
target_data_type(if specified) and the data type ofconstantmust be one of the supported types.For example, the search optimization service supports the following predicates:
Matching an OBJECT or MAP element against a NUMBER constant without explicitly casting the element:
WHERE src:person.age = 42;
Explicitly casting an OBJECT or MAP element to NUMBER with a specified precision and scale:
WHERE src:location.temperature::NUMBER(8, 6) = 23.456789;
Matching an OBJECT or MAP element against a VARCHAR constant without explicitly casting the element:
WHERE src:sender_info.ip_address = '123.123.123.123';
Explicitly casting an OBJECT or MAP element to VARCHAR:
WHERE src:salesperson.name::VARCHAR = 'John Appleseed';
Explicitly casting an OBJECT or MAP element to DATE:
WHERE src:events.date::DATE = '2021-03-26';
Explicitly casting an OBJECT or MAP element to TIMESTAMP with a specified scale:
WHERE src:event_logs.exceptions.timestamp_info(3) = '2021-03-26 15:00:00.123 -0800';
Matching an ARRAY element against a value of a supported type, with or without an explicit cast:
WHERE my_array_column[2] = 5; WHERE my_array_column[2]::NUMBER(4, 1) = 5;
Matching an OBJECT or MAP element against a value of a supported type, with or without an explicit cast:
WHERE object_column['mykey'] = 3; WHERE object_column:mykey = 3; WHERE object_column['mykey']::NUMBER(4, 1) = 3; WHERE object_column:mykey::NUMBER(4, 1) = 3;
Predicates that use the ARRAY functions, such as the following predicates:
WHERE ARRAY_CONTAINS(value_expr, array)In this syntax,
value_exprmust not be NULL and must evaluate to VARIANT. The data type of the value must be one of the supported types:WHERE ARRAY_CONTAINS('77.146.211.88'::VARIANT, src:logs.ip_addresses)
In this example, the value is a constant that is implicitly cast to an OBJECT:
WHERE ARRAY_CONTAINS(300, my_array_column)
WHERE ARRAYS_OVERLAP(ARRAY_CONSTRUCT(constant_1, constant_2, .., constant_N), array)The data type of each constant —
constant_1,constant_2, and so on — must be one of the supported types. The constructed ARRAY can include NULL constants.In this example, the array is in an OBJECT value:
WHERE ARRAYS_OVERLAP( ARRAY_CONSTRUCT('122.63.45.75', '89.206.83.107'), src:senders.ip_addresses)
In this example, the array is in an ARRAY column:
WHERE ARRAYS_OVERLAP( ARRAY_CONSTRUCT('a', 'b'), my_array_column)
The following predicates check for NULL values:
WHERE IS_NULL_VALUE(path_to_element)Note
IS_NULL_VALUE applies to JSON null values and not to SQL NULL values.
WHERE path_to_element IS NOT NULLWHERE structured_column IS NULLwhere
structured_columnrefers to the column and not a path to an element in the structured data.For example, the search optimization service supports using the OBJECT column
srcbut not the path to the elementsrc:person.agein that OBJECT column.
Substring search in structured types¶
You can enable substring search only if the target structured element is a text string data type.
For example, consider the following table:
CREATE TABLE t(
col OBJECT(
a INTEGER,
b STRING,
c MAP(INTEGER, STRING),
d ARRAY(STRING)
)
);
For this table, search optimization for SUBSTRING search can be added on the following target structured elements:
col:bbecause its type is STRING.col:c[value]— for example,col:c[0],col:c[100]— if the values are text string types.
For this table, search optimization for SUBSTRING search can’t be added on the following target structured elements:
colbecause its type is structured OBJECT.col:abecause its type is INTEGER.col:cbecause its type is MAP.col:dbecause its type is ARRAY.
The search optimization service can optimize predicates that use the following functions:
You can enable substring search optimization for a column or for multiple individual elements within a column. For example, the following statement enables substring search optimization for a nested element in a column:
ALTER TABLE test_table ADD SEARCH OPTIMIZATION ON SUBSTRING(col2:data.search);
After the search access path has been built, the following query can be optimized:
SELECT * FROM test_table WHERE col2:data.search LIKE '%optimization%';
However, the following queries aren’t optimized because the WHERE clause filters don’t apply to the element
that was specified when search optimization was enabled (col2:data.search):
SELECT * FROM test_table WHERE col2:name LIKE '%simon%parker%';
SELECT * FROM test_table WHERE col2 LIKE '%hello%world%';
You can specify multiple elements to be optimized. In the following example, search optimization is enabled for two specific
elements in the column col2:
ALTER TABLE test_table ADD SEARCH OPTIMIZATION ON SUBSTRING(col2:name);
ALTER TABLE test_table ADD SEARCH OPTIMIZATION ON SUBSTRING(col2:data.search);
If you enable search optimization for a given element, it is enabled for any unnested elements of a text string type. Search optimization isn’t enabled for nested elements or elements of non-text string types.
How constants are evaluated for structured substring searches¶
When it evaluates the constant string in a query — for example, LIKE 'constant_string' — the search optimization service splits the
string into tokens by using the following characters as delimiters:
Square brackets (
[and]).Curly braces (
{and}).Colons (
:).Commas (
,).Double quotes (
").
After it splits the string into tokens, the search optimization service considers only tokens that are at least five characters long. The following table explains how the search optimization service handles various predicate examples:
Example of a predicate |
How the search optimization service handles the query |
|---|---|
|
The search optimization service doesn’t use search access paths for the following predicate because the substring is shorter than five characters. |
|
The search optimization service can optimize this query, by using search access paths to search for |
|
The search optimization service can optimize this query, by using search access paths to search for |
|
The search optimization service splits this string into |
|
The search optimization service splits this string into the tokens |
|
The search optimization service splits this string into the tokens |
Schema evolution support¶
The schema of structured columns can evolve over time. For more information about schema evolution, see ALTER ICEBERG TABLE … ALTER COLUMN … SET DATA TYPE (structured types).
As part of a single schema-evolution operation, the following modifications can occur:
Type widening
Reordering elements
Adding elements
Removing elements
Renaming elements
The search optimization service isn’t invalidated as part of the schema-evolution operation. Instead, the search optimization service handles operations in the following ways:
- Type widening (for example, INT to NUMBER)
Search optimization access paths aren’t affected.
- Adding elements
The newly added elements are automatically reflected in the existing search optimization access paths.
- Removing elements
When elements are removed from a structured column, the search optimization service automatically drops access paths that are prefixed by the removed element.
For example, create a table with a column of OBJECT type, and then insert data:
CREATE OR REPLACE TABLE test_struct ( a OBJECT( b INTEGER, c OBJECT( d STRING, e VARIANT ) ) ); INSERT INTO test_struct (a) SELECT { 'b': 100, 'c': { 'd': 'value1', 'e': 'value2' } }::OBJECT( b INTEGER, c OBJECT( d STRING, e VARIANT ) );
To view the data, query the table:
SELECT * FROM test_struct;
+--------------------+ | A | |--------------------| | { | | "b": 100, | | "c": { | | "d": "value1", | | "e": "value2" | | } | | } | +--------------------+
The following statement removes element
cfrom the object:ALTER TABLE test_struct ALTER COLUMN a SET DATA TYPE OBJECT( b INTEGER);
When this statement runs, the access paths at
a,a:c,a:c:danda:c:eare dropped.- Renaming elements
When an element is renamed, the search optimization service automatically drops access paths prefixed by the renamed element and adds them back with the newly named path. This operation incurs an additional maintenance cost to process the newly added path in the search optimization service.
For example, create a table with a column of OBJECT type, and then insert data:
CREATE OR REPLACE TABLE test_struct ( a OBJECT( b INTEGER, c OBJECT( d STRING, e VARIANT ) ) ); INSERT INTO test_struct (a) SELECT { 'b': 100, 'c': { 'd': 'value1', 'e': 'value2' } }::OBJECT( b INTEGER, c OBJECT( d STRING, e VARIANT ) );
To view the data, query the table:
SELECT * FROM test_struct;
+--------------------+ | A | |--------------------| | { | | "b": 100, | | "c": { | | "d": "value1", | | "e": "value2" | | } | | } | +--------------------+
The following statement renames element
ctoc_newin the object:ALTER TABLE test_struct ALTER COLUMN a SET DATA TYPE OBJECT( b INTEGER, c_new OBJECT( d STRING, e VARIANT ) ) RENAME FIELDS;
The access paths at
a,a:c,a:c:d,a:c:eare dropped and re-added asa,a:c_new,a:c_new:d,a:c_new:e.- Reordering elements
Search optimization access paths aren’t affected.
Current limitations in support for structured types¶
Support for structured types in the search optimization service is limited in the following ways:
Predicates of the form
path_to_element IS NULLaren’t supported.Predicates where the constants are results of scalar subqueries aren’t supported.
Predicates that specify paths to elements that contain sub-elements aren’t supported.
Predicates that use the XMLGET function aren’t supported.
Predicates that use the MAP_CONTAINS_KEY function aren’t supported.
The current limitations of the search optimization service also apply to structured types.