Query a Cortex Search Service¶

When you create a Cortex Search Service, the system provisions an API endpoint to serve queries at low latency. You can use three APIs for querying a Cortex Search Service:

The Python API
The REST API
The SQL SEARCH_PREVIEW Function

Parameters¶

All APIs support the same set of query parameters:

	Parameter	Description
Required	`query`	The search query, to be searched for in the text column in the service.
Optional	`columns`	A comma-separated list of columns to return for each relevant result in the response. These columns must be included in the source query for the service. If this parameter is not provided, only the search column is returned in the response.
	`filter`	A filter object for filtering results based on data in the `ATTRIBUTES` columns. See Filter syntax for syntax.
	`scoring_config`	Configuration object for customizing search ranking behavior. See Customizing Cortex Search Scoring for syntax.
	`scoring_profile`	The named scoring profile to be used with the query, previously defined with ALTER CORTEX SEARCH SERVICE … ADD SCORING PROFILE. If `scoring_profile` is provided, any `scoring_config` provided is ignored.
	`limit`	Maximum number of results to return in the response, up to 1000. The default limit is 10.

Syntax¶

The following examples show how to query a Cortex Search Service using all three surfaces:

import os
from snowflake.core import Root
from snowflake.snowpark import Session

# connect to Snowflake
CONNECTION_PARAMETERS = { ... }
session = Session.builder.configs(CONNECTION_PARAMETERS).create()
root = Root(session)

# fetch service
my_service = (root
    .databases["<service_database>"]
    .schemas["<service_schema>"]
    .cortex_search_services["<service_name>"]
)

# query service
resp = my_service.search(
    query="<query>",
    columns=["<col1>", "<col2>"],
    filter={"@eq": {"<column>": "<value>"} },
    limit=5
)
print(resp.to_json())

Copy

curl --location https://<ACCOUNT_URL>/api/v2/databases/<DB_NAME>/schemas/<SCHEMA_NAME>/cortex-search-services/<SERVICE_NAME>:query \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header "Authorization: Bearer $PAT" \
--data '{
  "query": "<search_query>",
  "columns": ["col1", "col2"],
  "filter": <filter>,
  "limit": <limit>
}'

Copy

SELECT PARSE_JSON(
  SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
      'my_search_service',
      '{
         "query": "preview query",
         "columns":[
            "col1",
            "col2"
         ],
         "filter": {"@eq": {"col1": "filter value"} },
         "limit":10
      }'
  )
)['results'] as results;

Copy

Setup and authentication¶

Python API¶

Cortex Search Services may be queried using version 0.8.0 or later of the Snowflake Python APIs. See Snowflake Python APIs: Managing Snowflake objects with Python for more information on the Snowflake Python APIs.

Install the Snowflake Python API library¶

First, install the latest version of the Snowflake Python APIs package from PyPI. See Install the Snowflake Python APIs library for instructions on installing this package from PyPI.

pip install snowflake -U

Copy

Connect to Snowflake¶

Connect to Snowflake using either a Snowpark Session or a Python Connector Connection and create a Root object. See Connect to Snowflake with the Snowflake Python APIs for more instructions on connecting to Snowflake. The following example uses the Snowpark Session object and a Python dictionary for configuration.

import os
from snowflake.core import Root
from snowflake.snowpark import Session

CONNECTION_PARAMETERS = {
    "account": os.environ["snowflake_account_demo"],
    "user": os.environ["snowflake_user_demo"],
    "password": os.environ["snowflake_password_demo"],
    "role": "test_role",
    "database": "test_database",
    "warehouse": "test_warehouse",
    "schema": "test_schema",
}

session = Session.builder.configs(CONNECTION_PARAMETERS).create()
root = Root(session)

Copy

Note

Version 0.8.0 or later of the Snowflake Python APIs library is required to query a Cortex Search Service.

REST API¶

Cortex Search exposes a REST API endpoint in the suite of Snowflake REST APIs. The REST endpoint generated for a Cortex Search Service is of the following structure:

https://<account_url>/api/v2/databases/<db_name>/schemas/<schema_name>/cortex-search-services/<service_name>:query

Copy

Where:

<account_url>: Your Snowflake Account URL. See Finding the organization and account name for an account for instructions on finding your account URL.
<db_name>: Database in which the service resides.
<schema_name>: Schema in which the service resides.
<service_name>: Name of the service.
:query: The method to invoke on the service; in this case, the query method.

For additional details, see the REST API reference for Cortex Search Service.

Authentication¶

Snowflake REST APIs support authentication via programmatic access tokens (PATs), key pair authentication using JSON Web Tokens (JWTs), and OAuth. For details, see Authenticating Snowflake REST APIs with Snowflake.

SQL SEARCH_PREVIEW function¶

The SNOWFLAKE.CORTEX.SEARCH_PREVIEW function allows you to preview the results of individual queries to a Cortex Search Service from within a SQL environment such as a worksheet or Snowflake notebook cell. This function makes it easy to interactively validate that a service has populated correctly and is serving reasonable results.

Important

The SEARCH_PREVIEW function is provided for testing and validation of Cortex Search Services. It is not intended for serving search queries in an end-user application.

The function operates only on string literals. It does not accept batch text data.
The function has higher latency than the REST and Python APIs..

Filter syntax¶

Cortex Search supports filtering on the ATTRIBUTES columns specified in the CREATE CORTEX SEARCH SERVICE command.

Cortex Search supports five matching operators:

TEXT or NUMERIC equality: @eq
ARRAY contains: @contains
NUMERIC or DATE/TIMESTAMP greater than or equal to: @gte
NUMERIC or DATE/TIMESTAMP less than or equal to: @lte
Primary key equality: @primarykey

These matching operators can be composed with various logical operators:

@and
@or
@not

Usage notes¶

Matching against NaN (‘not a number’) values in the source query is handled as described in Special values.
Fixed-point numeric values with more than 19 digits (not including leading zeroes) do not work with @eq, @gte, or @lte and will not be returned by these operators (although they could still be returned by the overall query with the use of @not).
TIMESTAMP and DATE filters accept values of the form: YYYY-MM-DD and, for timezone aware dates: YYYY-MM-DD+HH:MM. If the timezone offset is not specified, the date is interpreted in UTC.
@primarykey is only supported for services configured with a primary key. The value of the filter must be a JSON object mapping every primary key column to its corresponding value (or NULL).

These operators can be combined into a single filter object.

Example¶

Filtering on rows where string-like column string_col is equal to value value.
```
{ "@eq": { "string_col": "value" } }
```
Copy
Filtering to a row with the specified primary key values us-west-1 in the region column and abc123 in the agent_id column:
```
{ "@primarykey": { "region": "us-west-1", "agent_id": "abc123" } }
```
Copy
Filtering on rows where ARRAY column array_col contains value value.
```
{ "@contains": { "array_col": "arr_value" } }
```
Copy

Filtering on rows where NUMERIC column numeric_col is between 10.5 and 12.5 (inclusive):

{
  "@and": [
    { "@gte": { "numeric_col": 10.5 } },
    { "@lte": { "numeric_col": 12.5 } }
  ]
}

Copy

Filtering on rows where TIMESTAMP column timestamp_col is between 2024-11-19 and 2024-12-19 (inclusive).

{
  "@and": [
    { "@gte": { "timestamp_col": "2024-11-19" } },
    { "@lte": { "timestamp_col": "2024-12-19" } }
  ]
}

Copy

Composing filters with logical operators:

// Rows where the "array_col" column contains "arr_value" and the "string_col" column equals "value"
{
  "@and": [
    { "@contains": { "array_col": "arr_value" } },
    { "@eq": { "string_col": "value" } }
  ]
}

// Rows where the "string_col" column does not equal "value"
{
  "@not": { "@eq": { "string_col": "value" } }
}

// Rows where the "array_col" column contains at least one of "val1", "val2", or "val3"
{
  "@or": [
    { "@contains": { "array_col": "val1" } },
    { "@contains": { "array_col": "val2" } },
    { "@contains": { "array_col": "val3" } }
  ]
}

Copy

Access control requirements¶

The role that is querying the Cortex Search Service must have the following privileges to retrieve results:

Privilege	Object
USAGE	The Cortex Search Service
USAGE	The database in which the Cortex Search Service resides
USAGE	The schema in which the Cortex Search Service resides

Querying with owner’s rights¶

Cortex Search Services perform searches with owner’s rights and follow the same security model as other Snowflake objects that run with owner’s rights.

In particular, this means that any role with sufficient privileges to query a Cortex Search Service may query any of the data the service has indexed, regardless of that role’s privileges on the underlying objects (such as tables and views) referenced in the service’s source query.

For example, for a Cortex Search Service that references a table with row-level masking policies, querying users of that service will be able to see search results from rows on which the owner’s role has read permission, even if the querying user’s role cannot read those rows in the source table.

Use caution, for example, when granting a role with USAGE privileges on a Cortex Search Service to another Snowflake user.

Known limitations¶

Querying a Cortex Search Service is subject to the following limitations:

Response size: The total size of the response payload returned from a search query to a Cortex Search Service must not exceed the following limits:
- REST API and Python API: 10 Megabytes (MB)
- SQL SEARCH_PREVIEW Function: 300 Kilobytes (KB)

Examples¶

This section provides comprehensive examples for querying Cortex Search Services across all three API methods.

Setup for examples¶

The following examples use a table named business_documents with timestamp and numeric columns for demonstrating various features:

CREATE OR REPLACE TABLE business_documents (
    document_contents VARCHAR,
    last_modified_timestamp TIMESTAMP,
    created_timestamp TIMESTAMP,
    likes INT,
    comments INT
);

INSERT INTO business_documents (document_contents, last_modified_timestamp, created_timestamp, likes, comments)
VALUES
    ('Quarterly financial report for Q1 2024: Revenue increased by 15%, with expenses stable.',
     '2024-01-12 10:00:00', '2024-01-10 09:00:00', 10, 20),

    ('IT manual for employees: Instructions for usage of internal technologies, including hardware.',
     '2024-02-10 15:00:00', '2024-02-05 14:30:00', 85, 10),

    ('Employee handbook 2024: Updated policies on remote work, health benefits, and company culture.',
     '2024-02-10 15:00:00', '2024-02-05 14:30:00', 85, 10),

    ('Marketing strategy document: Target audience segmentation for upcoming product launch.',
     '2024-03-15 12:00:00', '2024-03-12 11:15:00', 150, 32),

    ('Product roadmap 2024: Key milestones for tech product development, including the launch.',
     '2024-04-22 17:30:00', '2024-04-20 16:00:00', 200, 45),

    ('Annual performance review process guidelines: Procedures for managers to conduct employee.',
     '2024-05-02 09:30:00', '2024-05-01 08:45:00', 60, 5);

CREATE OR REPLACE CORTEX SEARCH SERVICE business_documents_css
    ON document_contents
    WAREHOUSE = <warehouse_name>
    TARGET_LAG = '1 minute'
AS SELECT * FROM business_documents;

Copy

Filter examples¶

Simple query with an equality filter¶

resp = business_documents_css.search(
    query="technology",
    columns=["DOCUMENT_CONTENTS", "LIKES"],
    filter={"@eq": {"REGION": "US"}},
    limit=5
)

Copy

curl --location https://<ACCOUNT_URL>/api/v2/databases/<DB_NAME>/schemas/<SCHEMA_NAME>/cortex-search-services/<SERVICE_NAME>:query \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header "Authorization: Bearer $PAT" \
--data '{
  "query": "technology",
  "columns": ["DOCUMENT_CONTENTS", "LIKES"],
  "filter": {"@eq": {"REGION": "US"}},
  "limit": 5
}'

Copy

SELECT PARSE_JSON(
  SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
      'business_documents_css',
      '{
         "query": "technology",
         "columns": ["DOCUMENT_CONTENTS", "LIKES"],
         "filter": {"@eq": {"REGION": "US"}},
         "limit": 5
      }'
  )
)['results'] as results;

Copy

Range filter¶

resp = business_documents_css.search(
    query="business",
    columns=["DOCUMENT_CONTENTS", "LIKES", "COMMENTS"],
    filter={"@and": [
        {"@gte": {"LIKES": 50}},
        {"@lte": {"COMMENTS": 50}}
    ]},
    limit=10
)

Copy

curl --location https://<ACCOUNT_URL>/api/v2/databases/<DB_NAME>/schemas/<SCHEMA_NAME>/cortex-search-services/<SERVICE_NAME>:query \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header "Authorization: Bearer $PAT" \
--data '{
  "query": "business",
  "columns": ["DOCUMENT_CONTENTS", "LIKES", "COMMENTS"],
  "filter": {"@and": [
    {"@gte": {"LIKES": 50}},
    {"@lte": {"COMMENTS": 50}}
  ]},
  "limit": 10
}'

Copy

SELECT PARSE_JSON(
  SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
      'business_documents_css',
      '{
         "query": "business",
         "columns": ["DOCUMENT_CONTENTS", "LIKES", "COMMENTS"],
         "filter": {"@and": [
           {"@gte": {"LIKES": 50}},
           {"@lte": {"COMMENTS": 50}}
         ]},
         "limit": 10
      }'
  )
)['results'] as results;

Copy

Scoring examples¶

Numeric boosts¶

Apply numeric boosts to both the likes and comments columns, with twice the boost weight on comments values relative to likes values.

resp = business_documents_css.search(
    query="technology",
    columns=["DOCUMENT_CONTENTS", "LIKES", "COMMENTS"],
    scoring_config={
        "functions": {
            "numeric_boosts": [
                {"column": "comments", "weight": 2},
                {"column": "likes", "weight": 1}
            ]
        }
    }
)

Copy

curl --location https://<ACCOUNT_URL>/api/v2/databases/<DB_NAME>/schemas/<SCHEMA_NAME>/cortex-search-services/<SERVICE_NAME>:query \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header "Authorization: Bearer $PAT" \
--data '{
  "query": "technology",
  "columns": ["DOCUMENT_CONTENTS", "LIKES", "COMMENTS"],
  "scoring_config": {
    "functions": {
      "numeric_boosts": [
        {"column": "comments", "weight": 2},
        {"column": "likes", "weight": 1}
      ]
    }
  }
}'

Copy

SELECT PARSE_JSON(
  SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
      'business_documents_css',
      '{
         "query": "technology",
         "columns": ["DOCUMENT_CONTENTS", "LIKES", "COMMENTS"],
         "scoring_config": {
           "functions": {
             "numeric_boosts": [
               {"column": "comments", "weight": 2},
               {"column": "likes", "weight": 1}
             ]
           }
         }
      }'
  )
)['results'] as results;

Copy

In the results, note:

With the boosts, the “Product roadmap 2024:…” document is the top result because of its large number of likes and comments, even though it has slightly lower relevance to the query “technology”

Without any boosts, the top result for the query is “IT manual for employees:…”

Time decays¶

Apply time decays based on the LAST_MODIFIED_TIMESTAMP column, where:

Documents with more recent LAST_MODIFIED_TIMESTAMP values, relative to the now timestamp, are boosted

Documents with a LAST_MODIFIED_TIMESTAMP value greater than 240 hours from the now timestamp receive little boosting

resp = business_documents_css.search(
    query="technology",
    columns=["DOCUMENT_CONTENTS", "LAST_MODIFIED_TIMESTAMP"],
    scoring_config={
        "functions": {
            "time_decays": [
                {"column": "LAST_MODIFIED_TIMESTAMP", "weight": 1, "limit_hours": 240, "now": "2024-04-23T00:00:00.000-08:00"}
            ]
        }
    }
)

Copy

curl --location https://<ACCOUNT_URL>/api/v2/databases/<DB_NAME>/schemas/<SCHEMA_NAME>/cortex-search-services/<SERVICE_NAME>:query \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header "Authorization: Bearer $PAT" \
--data '{
  "query": "technology",
  "columns": ["DOCUMENT_CONTENTS", "LAST_MODIFIED_TIMESTAMP"],
  "scoring_config": {
    "functions": {
      "time_decays": [
        {"column": "LAST_MODIFIED_TIMESTAMP", "weight": 1, "limit_hours": 240, "now": "2024-04-23T00:00:00.000-08:00"}
      ]
    }
  }
}'

Copy

SELECT PARSE_JSON(
  SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
      'business_documents_css',
      '{
         "query": "technology",
         "columns": ["DOCUMENT_CONTENTS", "LAST_MODIFIED_TIMESTAMP"],
         "scoring_config": {
           "functions": {
             "time_decays": [
               {"column": "LAST_MODIFIED_TIMESTAMP", "weight": 1, "limit_hours": 240, "now": "2024-04-23T00:00:00.000-08:00"}
             ]
           }
         }
      }'
  )
)['results'] as results;

Copy

In the results, note:

With the decays, the “Product roadmap 2024:…” document is the top result because of its recency to the now timestamp, even though it has slightly lower relevance to the query “technology”

Without any decays, the top result for the query is “IT manual for employees:…”

Disabling reranking¶

To disable reranking:

resp = business_documents_css.search(
    query="technology",
    columns=["DOCUMENT_CONTENTS", "LAST_MODIFIED_TIMESTAMP"],
    limit=5,
    scoring_config={
        "reranker": "none"
    }
)

Copy

curl --location https://<ACCOUNT_URL>/api/v2/databases/<DB_NAME>/schemas/<SCHEMA_NAME>/cortex-search-services/<SERVICE_NAME>:query \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header "Authorization: Bearer $PAT" \
--data '{
  "query": "technology",
  "columns": ["DOCUMENT_CONTENTS", "LAST_MODIFIED_TIMESTAMP"],
  "scoring_config": {
    "reranker": "none"
  }
}'

Copy

SELECT PARSE_JSON(
  SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
      'business_documents_css',
      '{
         "query": "technology",
         "columns": ["DOCUMENT_CONTENTS", "LAST_MODIFIED_TIMESTAMP"],
         "scoring_config": {
           "reranker": "none"
         }
      }'
  )
)['results'] as results;

Copy

Tip

To query a service with the reranker, omit the "reranker": "none" parameter from the scoring_config object, as reranking is the default behavior.