Speeding up queries of semi-structured data with search optimization¶
The search optimization service can improve the performance of point lookup and substring queries on semi-structured data in Snowflake tables (data in VARIANT, OBJECT, and ARRAY columns).
When VARIANT support for the search optimization service is configured for columns in a table, the search optimization service automatically includes VARIANT, OBJECT, and ARRAY columns in a search access path. This even applies to columns where the structure is deeply nested and the structure changes frequently. You may also enable search optimization for specific fields within a semi-structured column.
Note
You must explicitly enable this feature for specific columns or fields in columns using the ON clause in the ALTER TABLE … ADD SEARCH OPTIMIZATION command. (Enabling search optimization at the table level does not enable it for VARIANT columns.) For example:
ALTER TABLE mytable ADD SEARCH OPTIMIZATION ON EQUALITY(myvariantcol);
ALTER TABLE t1 ADD SEARCH OPTIMIZATION ON EQUALITY(c4:user.uuid);
ALTER TABLE mytable ADD SEARCH OPTIMIZATION ON SUBSTRING(myvariantcol);
ALTER TABLE t1 ADD SEARCH OPTIMIZATION ON SUBSTRING(c4:user.uuid);
Queries against a VARIANT column are not optimized if you omit the ON clause.
The next sections provide more details about this support:
Supported data types for constants and casts in predicates for VARIANT types¶
The search optimization service can improve the performance of point lookups of semi-structured data where the following types are used for the constant and the implicit or explicit cast for the element:
FIXED (including casts that specify a valid precision and scale)
INTEGER
TEXT
DATE (including casts that specify a scale)
TIME (including casts that specify a scale)
TIMESTAMP, TIMESTAMP_LTZ, TIMESTAMP_NTZ, TIMESTAMP_TZ (including casts that specify a scale)
The search optimization service supports the casting of types using:
Support for VARIANT values cast as TEXT¶
The search optimization service can also improve the performance of point lookups in which VARIANT columns are cast to TEXT and are compared to constants that are cast to TEXT.
For example, suppose that src
is a VARIANT column containing BOOLEAN, DATE, and TIMESTAMP values that have been converted to VARIANT:
CREATE OR REPLACE TABLE test_table
(
id INTEGER,
src VARIANT
);
INSERT INTO test_table SELECT 1, TO_VARIANT('true'::BOOLEAN);
INSERT INTO test_table SELECT 2, TO_VARIANT('2020-01-09'::DATE);
INSERT INTO test_table SELECT 3, TO_VARIANT('2020-01-09 01:02:03.899'::TIMESTAMP);
For this table, the search optimization service can improve the following queries, which cast the VARIANT column to TEXT and compare the column to string constants:
SELECT * FROM test_table WHERE src::TEXT = 'true';
SELECT * FROM test_table WHERE src::TEXT = '2020-01-09';
SELECT * FROM test_table WHERE src::TEXT = '2020-01-09 01:02:03.899';
Supported predicates for point lookups on VARIANT types¶
The search optimization service can improve point lookup queries with the types of predicates listed below. In the examples below, src
is the VARIANT column, and path_to_variant_field
is a
path to a field in the VARIANT column.
Equality predicates of the following form:
WHERE path_to_variant_field[::target_data_type] = constant
In this syntax,
target_data_type
(if specified) and the data type ofconstant
must be one of the supported types.For example, the search optimization service supports:
Matching an element against a NUMBER constant without explicitly casting the element.
WHERE src:person.age = 42;
Explicitly casting an element to NUMBER with a specified precision and scale.
WHERE src:location.temperature::NUMBER(8, 6) = 23.456789;
Matching an element against a TEXT constant without explicitly casting the element.
WHERE src:sender_info.ip_address = '123.123.123.123';
Explicitly casting an element to TEXT.
WHERE src:salesperson.name::TEXT = 'John Appleseed';
Explicitly casting an element to DATE.
WHERE src:events.date::DATE = '2021-03-26';
Explicitly casting an element to TIMESTAMP with a specified scale.
WHERE src:event_logs.exceptions.timestamp_info(3) = '2021-03-26 15:00:00.123 -0800';
Predicates that use the ARRAY functions, such as:
WHERE ARRAY_CONTAINS(constant::VARIANT, path_to_variant_field)
In this syntax,
constant
must not be NULL, and the data type ofconstant
must be one of the supported types.For example:
WHERE ARRAY_CONTAINS('77.146.211.88'::VARIANT, src:logs.ip_addresses)
WHERE ARRAYS_OVERLAP(ARRAY_CONSTRUCT(constant_1, constant_2, .., constant_N), path_to_variant_field)
The data type of each constant (
constant_1
,constant_2
, etc.) must be one of the supported types. The constructed ARRAY can include NULL constants.For example:
WHERE ARRAYS_OVERLAP( ARRAY_CONSTRUCT('122.63.45.75', '89.206.83.107'), src:senders.ip_addresses)
The following predicates that check for NULL values:
WHERE IS_NULL_VALUE(path_to_variant_field)
Note that IS_NULL_VALUE applies to JSON null values and not to SQL NULL values.
WHERE path_to_variant_field IS NOT NULL
WHERE variant_column IS NULL
where
variant_column
refers to the column and not a path to an element in the semi-structured data.For example, the search optimization service supports using the VARIANT column
src
but not the path to the fieldsrc:person.age
in that VARIANT column.
Substring search in VARIANT types¶
The search optimization service can optimize wildcard or regular expression searches in semi-structured columns (including ARRAY, OBJECT, and VARIANT columns) or fields in such columns. This includes predicates that use:
You can enable substring search optimization for a column or for multiple individual fields within a column. For example, the following statement enables substring search optimization for a nested field in a column.
ALTER TABLE test_table ADD SEARCH OPTIMIZATION ON SUBSTRING(col2:data.search);
Once the search access path has been built, the following query can be optimized:
SELECT * FROM test_table WHERE col2:data.search LIKE '%optimization%';
However, the following queries are not optimized because the WHERE clause filters do not apply to the field
that was specified when search optimization was enabled (col2:data.search
).
SELECT * FROM test_table WHERE col2:name LIKE '%simon%parker%';
SELECT * FROM test_table WHERE col2 LIKE '%hello%world%';
You can specify multiple fields to be optimized. Here, search optimization is enabled for two specific fields in the column col2
.
ALTER TABLE test_table ADD SEARCH OPTIMIZATION ON SUBSTRING(col2:name);
ALTER TABLE test_table ADD SEARCH OPTIMIZATION ON SUBSTRING(col2:data.search);
If you enable search optimization for a given field, it is enabled for any subfields. The second ALTER TABLE statement
below is redundant because the first statement enables search optimization for the entire data
field, including
the nested search
field.
ALTER TABLE test_table ADD SEARCH OPTIMIZATION ON SUBSTRING(col2:data);
ALTER TABLE test_table ADD SEARCH OPTIMIZATION ON SUBSTRING(col2:data.search);
Similarly, enabling search optimization for an entire column allows all substring searches on that column to be optimized, including fields nested to any depth within it.
How constants are evaluated for VARIANT substring searches¶
When evaluating the constant string in a query (e.g. LIKE 'constant_string'
), the search optimization service splits the
string into tokens using the following characters as delimiters:
Square brackets (
[
and]
).Curly braces (
{
and}
).Colons (
:
).Commas (
,
).Double quotes (
"
).
After splitting the string into tokens, the search optimization service considers only tokens that are at least 5 characters long.
Example of a Predicate |
How the Search Optimization Service Handles the Query |
---|---|
|
The search optimization service does not use search access paths for the following predicate because the substring is shorter than 5 characters. |
|
The search optimization service can optimize this query, using search access paths to search for |
|
The search optimization service can optimize this query, using search access paths to search for |
|
The search optimization service splits this string into |
|
The search optimization service splits this into the tokens |
|
The search optimization service splits this string into the tokens |
Current limitations in support for VARIANT types¶
Currently, support for VARIANT types in the search optimization service has the following limitations:
Predicates that use XMLGET are not supported.
Predicates of the form
variant_field IS NULL
are not supported.Predicates where the constants are results of scalar subqueries are not supported.
Predicates that specify paths to elements that contain sub-elements are not supported.
The current limitations of the search optimization service also apply to this feature.