Speeding up queries of semi-structured data with search optimization¶
The search optimization service can improve the performance of point lookup and substring queries on semi-structured data in Snowflake tables (that is, data in VARIANT, OBJECT, and ARRAY columns). You can configure search optimization on columns of these types even when the structure is deeply nested and changes frequently. You can also enable search optimization for specific elements within a semi-structured column.
Note
You must explicitly enable this feature for specific columns or elements in columns using the ON clause in the ALTER TABLE … ADD SEARCH OPTIMIZATION command. Enabling search optimization at the table level doesn’t enable it for columns with semi-structured data types. For example:
ALTER TABLE mytable ADD SEARCH OPTIMIZATION ON EQUALITY(myvariantcol);
ALTER TABLE t1 ADD SEARCH OPTIMIZATION ON EQUALITY(c4:user.uuid);
ALTER TABLE mytable ADD SEARCH OPTIMIZATION ON SUBSTRING(myvariantcol);
ALTER TABLE t1 ADD SEARCH OPTIMIZATION ON SUBSTRING(c4:user.uuid);
ALTER TABLE mytable ADD SEARCH OPTIMIZATION ON EQUALITY(object_column);
ALTER TABLE mytable ADD SEARCH OPTIMIZATION ON SUBSTRING(object_column);
ALTER TABLE mytable ADD SEARCH OPTIMIZATION ON EQUALITY(array_column);
ALTER TABLE mytable ADD SEARCH OPTIMIZATION ON SUBSTRING(array_column);
Queries against VARIANT, OBJECT, and ARRAY columns aren’t optimized if you omit the ON clause.
The following sections provide more information about this support:
Supported data types for constants and casts in predicates for semi-structured types¶
The search optimization service can improve the performance of point lookups of semi-structured data where the following types are used for the constant and the implicit or explicit cast for the element:
FIXED (including casts that specify a valid precision and scale)
INTEGER
TEXT
DATE (including casts that specify a scale)
TIME (including casts that specify a scale)
TIMESTAMP, TIMESTAMP_LTZ, TIMESTAMP_NTZ, TIMESTAMP_TZ (including casts that specify a scale)
The search optimization service supports the casting of types using:
Support for semi-structured data type values cast to TEXT¶
The search optimization service can also improve the performance of point lookups in which columns with semi-structured data types are cast to TEXT and are compared to constants that are cast to TEXT.
For example, suppose that src
is a VARIANT column containing BOOLEAN, DATE, and TIMESTAMP values that have been converted to VARIANT:
CREATE OR REPLACE TABLE test_table
(
id INTEGER,
src VARIANT
);
INSERT INTO test_table SELECT 1, TO_VARIANT('true'::BOOLEAN);
INSERT INTO test_table SELECT 2, TO_VARIANT('2020-01-09'::DATE);
INSERT INTO test_table SELECT 3, TO_VARIANT('2020-01-09 01:02:03.899'::TIMESTAMP);
For this table, the search optimization service can improve the following queries, which cast the VARIANT column to TEXT and compare the column to string constants:
SELECT * FROM test_table WHERE src::TEXT = 'true';
SELECT * FROM test_table WHERE src::TEXT = '2020-01-09';
SELECT * FROM test_table WHERE src::TEXT = '2020-01-09 01:02:03.899';
Supported predicates for point lookups on VARIANT types¶
The search optimization service can improve point lookup queries with the types of predicates listed below. In the examples
below, src
is the column with a semi-structured data type, and path_to_element
is a
path to an element in the column with a semi-structured data type.
Equality predicates of the following form:
WHERE path_to_element[::target_data_type] = constant
In this syntax,
target_data_type
(if specified) and the data type ofconstant
must be one of the supported types.For example, the search optimization service supports:
Matching a VARIANT element against a NUMBER constant without explicitly casting the element.
WHERE src:person.age = 42;
Explicitly casting a VARIANT element to NUMBER with a specified precision and scale.
WHERE src:location.temperature::NUMBER(8, 6) = 23.456789;
Matching a VARIANT element against a TEXT constant without explicitly casting the element.
WHERE src:sender_info.ip_address = '123.123.123.123';
Explicitly casting a VARIANT element to TEXT.
WHERE src:salesperson.name::TEXT = 'John Appleseed';
Explicitly casting a VARIANT element to DATE.
WHERE src:events.date::DATE = '2021-03-26';
Explicitly casting a VARIANT element to TIMESTAMP with a specified scale.
WHERE src:event_logs.exceptions.timestamp_info(3) = '2021-03-26 15:00:00.123 -0800';
Matching an ARRAY element against a value of a supported type, with or without explicitly casting to the type. For example:
WHERE my_array_column[2] = 5; WHERE my_array_column[2]::NUMBER(4, 1) = 5;
Matching an OBJECT element against a value of a supported type, with or without explicitly casting to the type. For example:
WHERE object_column['mykey'] = 3; WHERE object_column:mykey = 3; WHERE object_column['mykey']::NUMBER(4, 1) = 3; WHERE object_column:mykey::NUMBER(4, 1) = 3;
Predicates that use the ARRAY functions, such as:
WHERE ARRAY_CONTAINS(value_expr, array)
In this syntax,
value_expr
must not be NULL and must evaluate to VARIANT. The data type of the value must be one of the supported types.For example:
WHERE ARRAY_CONTAINS('77.146.211.88'::VARIANT, src:logs.ip_addresses)
In this example, the value is a constant that is implicitly cast to a VARIANT:
WHERE ARRAY_CONTAINS(300, my_array_column)
WHERE ARRAYS_OVERLAP(ARRAY_CONSTRUCT(constant_1, constant_2, .., constant_N), array)
The data type of each constant (
constant_1
,constant_2
, and so on) must be one of the supported types. The constructed ARRAY can include NULL constants.In this example, the array is in a VARIANT value:
WHERE ARRAYS_OVERLAP( ARRAY_CONSTRUCT('122.63.45.75', '89.206.83.107'), src:senders.ip_addresses)
In this example, the array is an ARRAY column:
WHERE ARRAYS_OVERLAP( ARRAY_CONSTRUCT('a', 'b'), my_array_column)
The following predicates that check for NULL values:
WHERE IS_NULL_VALUE(path_to_element)
Note that IS_NULL_VALUE applies to JSON null values and not to SQL NULL values.
WHERE path_to_element IS NOT NULL
WHERE semistructured_column IS NULL
where
semistructured_column
refers to the column and not a path to an element in the semi-structured data.For example, the search optimization service supports using the VARIANT column
src
but not the path to the elementsrc:person.age
in that VARIANT column.
Substring search in VARIANT types¶
The search optimization service can optimize wildcard or regular expression searches in semi-structured columns (that is, VARIANT, OBJECT, and ARRAY columns) or elements in such columns. This includes predicates that use:
You can enable substring search optimization for a column or for multiple individual elements within a column. For example, the following statement enables substring search optimization for a nested element in a column.
ALTER TABLE test_table ADD SEARCH OPTIMIZATION ON SUBSTRING(col2:data.search);
Once the search access path has been built, the following query can be optimized:
SELECT * FROM test_table WHERE col2:data.search LIKE '%optimization%';
However, the following queries are not optimized because the WHERE clause filters don’t apply to the element
that was specified when search optimization was enabled (col2:data.search
).
SELECT * FROM test_table WHERE col2:name LIKE '%simon%parker%';
SELECT * FROM test_table WHERE col2 LIKE '%hello%world%';
You can specify multiple elements to be optimized. Here, search optimization is enabled for two specific elements in the column col2
.
ALTER TABLE test_table ADD SEARCH OPTIMIZATION ON SUBSTRING(col2:name);
ALTER TABLE test_table ADD SEARCH OPTIMIZATION ON SUBSTRING(col2:data.search);
If you enable search optimization for a given element, it is enabled for any nested elements. The second ALTER TABLE statement
below is redundant because the first statement enables search optimization for the entire data
element, including
the nested search
element.
ALTER TABLE test_table ADD SEARCH OPTIMIZATION ON SUBSTRING(col2:data);
ALTER TABLE test_table ADD SEARCH OPTIMIZATION ON SUBSTRING(col2:data.search);
Similarly, enabling search optimization for an entire column allows all substring searches on that column to be optimized, including elements nested to any depth within it.
For an example that enables FULL_TEXT search optimization on a VARIANT column in the car_sales
table and its data
(described in Querying Semi-structured Data), see Enable FULL_TEXT search optimization on a VARIANT column.
How constants are evaluated for VARIANT substring searches¶
When evaluating the constant string in a query (for example, LIKE 'constant_string'
), the search optimization service splits the
string into tokens using the following characters as delimiters:
Square brackets (
[
and]
).Curly braces (
{
and}
).Colons (
:
).Commas (
,
).Double quotes (
"
).
After splitting the string into tokens, the search optimization service considers only tokens that are at least five characters long.
Example of a predicate |
How the search optimization service handles the query |
---|---|
|
The search optimization service doesn’t use search access paths for the following predicate because the substring is shorter than five characters. |
|
The search optimization service can optimize this query, using search access paths to search for |
|
The search optimization service can optimize this query, using search access paths to search for |
|
The search optimization service splits this string into |
|
The search optimization service splits this into the tokens |
|
The search optimization service splits this string into the tokens |
Current limitations in support for semi-structured types¶
Support for semi-structured types in the search optimization service is limited in the following ways:
Predicates that use XMLGET aren’t supported.
Predicates of the form
path_to_element IS NULL
aren’t supported.Predicates where the constants are results of scalar subqueries aren’t supported.
Predicates that specify paths to elements that contain sub-elements aren’t supported.
The current limitations of the search optimization service also apply to this feature.