Batch Cortex Search¶
The Batch Cortex Search function is a table function that allows you to submit a batch of queries to a Cortex Search Service. It is intended for offline use-cases with high throughput requirements, such as entity resolution, deduplication, or clustering tasks.
Jobs submitted to a Cortex Search Service with the CORTEX_SEARCH_BATCH function leverage additional compute resources to provide a significantly higher level of throughput (queries per second) than the interactive (Python, REST or, SEARCH_PREVIEW) API search query surfaces.
Syntax¶
Use the following syntax to query a Cortex Search Service in batch mode using the CORTEX_SEARCH_BATCH
table function:
SELECT
q.query,
r.*
FROM query_table AS q,
LATERAL CORTEX_SEARCH_BATCH(
service_name => '<database>.<schema>.<cortex_search_service>',
query => q.query, -- optional STRING
filter => q.filter, -- optional VARIANT
limit => 10, -- optional INT
options => q.options -- optional VARIANT
) AS r;
Parameters¶
The CORTEX_SEARCH_BATCH function supports the following parameters:
service_name(string, required)Fully-qualified name of the Cortex Search Service to query.
query(string, optional)Column containing query string for searching the service.
filter(variant, optional)Column containing filter objects to apply to the search results.
limit(integer, optional)Maximum number of results to return per query. Default: 10.
options(variant, optional)Column containing additional search options and configurations.
Bemerkung
At least one of query or filter must be specified.
Usage notes¶
The throughput of the batch search function may vary depending on the amount of data indexed in the queried Cortex Search Service and the complexity of the search queries. Users are encouraged to run the function on a small number of queries to measure the throughput for their specific workload. In general, queries to larger services with more filter conditions see lower throughput.
The throughput of the batch search function (the number of search queries processed per second) is not influenced by the size of the warehouse used to query it.
The batch search function is not optimized for quickly processing a small number of search queries. For sub-second latency on a small number of queries, it is suggested to use the interactive (Python, REST or, SEARCH_PREVIEW) API search query surfaces.
A single Cortex Search Service can be queried in interactive and batch mode concurrently without any degradation to interactive query performance or throughput. Separate compute resources are used to serve interactive and batch queries.
There is no limit to the number of concurrent batch queries that can be run at a given time on a given service.
Cost considerations¶
During the Preview, the batch search function does not incur any serving cost. The only incremental cost incurred to run a batch job is for the Virtual Warehouse compute. In future releases of the product, a serving cost will be incurred per unit of time that a batch job is running.
Regional availability¶
During the preview, the batch search function is available in the following regions:
AWS US East 1 (N. Virginia)
AWS US West 2 (Oregon)
Example Usage¶
In this example, match products in a user-submitted order form to a „golden“ product catalog.
-- Create the golden product catalog with canonical product names
CREATE OR REPLACE TABLE golden_catalog (product_name TEXT);
INSERT INTO golden_catalog VALUES
('Wireless Bluetooth Headphones'),
('Wireless Noise-Canceling Earbuds'),
('USB-C Charging Cable 6ft'),
('Portable Power Bank 10000mAh');
-- Create Cortex Search Service on the golden catalog
CREATE CORTEX SEARCH SERVICE golden_product_service
ON product_name
WAREHOUSE = <warehouse_name>
TARGET_LAG = '1 day'
AS
SELECT product_name FROM golden_catalog;
-- Create a table of user-submitted products (may contain variations or typos)
CREATE OR REPLACE TABLE submitted_products (product TEXT);
INSERT INTO submitted_products VALUES
('bluetooth headphones wireless'),
('usb c cable');
-- For each user-submitted product, query the service for the 5 closest golden results
SELECT
q.product, s.*
FROM submitted_products AS q,
LATERAL CORTEX_SEARCH_BATCH(
service_name => 'golden_product_service',
query => q.product,
limit => 2
) AS s;