Improve literal search to enhance Cortex Analyst responses

This topic describes ways to improve literal string searches to help Cortex Analyst generate more accurate SQL queries. Writing the correct SQL query to answer a question sometimes requires knowing exact literal values to filter on. Since those values can’t always be extracted directly from the question, a search of some kind may be needed.

For example, if a user asks a question such as:

What was my overall sales of iced tea in Q1?
Copy

You might try the following query:

SELECT DISTINCT name FROM product WHERE name LIKE '%iced%tea%'
Copy

If you’ve ever gone through this process yourself, you’ll know that this isn’t a perfect solution. For example, this query won’t show you any products named “Ice Tea”, but it will show you some “spiced tea”.

Cortex Analyst offers two solutions to help improve literal usage:

This is where integrating with Cortex Search can help. Cortex Search is a feature that enables low-latency, high-quality “fuzzy” search over text data. You can create a Cortex Search service to do a semantic search over the underlying database column to find any literal values needed for Cortex Analyst to use in the SQL query that answers the user’s question.

Semantic search over sample values

For dimensions with relatively low-cardinality (about 1 - 10 distinct values), using a sample value search by specifying enough sample values to show the structure of the response for the dimension is recommended. This solution requires no additional storage besides the minimal increase to the semantic model size.

Before Cortex Analyst generates a SQL query for your question, it does a semantic similarity search between your question and the provided sample values to identify any appropriate literal values that may be needed to write your query. Note that the semantic similarity search may retrieve more relevant literals than the fuzzy string matching query approach mentioned above.

Only a fixed-sized set of retrieved sample values will be presented to the LLM as literals that may be needed to write the SQL query. That means adding more sample values does not put you at risk of exceeding the LLM’s context window.

Semantic search using Cortex Search Service

For dimensions with higher cardinality (more than 10 distinct values) or dimensions whose values change frequently, you can use a Cortex Search Service to search through the literals. This solution reduces data duplication and keeps your semantic model concise.

Note

Cortex Search Services do come with additional storage and compute costs. For details, see Cost considerations.

In this preview, only a single Cortex Search Service per logical dimension is supported.

Setting Up Cortex Search Service

The following steps show how to set up a Cortex Search Service for a logical dimension in your Cortex Analyst semantic model:

  1. Create Cortex Search Service

    CREATE OR REPLACE CORTEX SEARCH SERVICE my_logical_dimension_search_service
      ON my_dimension
      WAREHOUSE = xsmall
      TARGET_LAG = '1 hour'
      AS (
          SELECT DISTINCT my_dimension FROM my_logical_dimension_landing_table
      );`
    
    Copy
  2. Include the Cortex Search service in your semantic model using the following yaml snippet:

    tables:
    
      - name: my_table
    
        base_table:
          database: my_database
          schema: my_schema
          table: my_table
    
        dimensions:
          - name: my_dimension
            expr: my_column
            cortex_search_service:
              service: my_logical_dimension_search_service
              literal_column: my_column     # optional
              database: my_search_database  # optional
              schema: my_search_schema      # optional
    
    Copy

    The following fields are optional under cortex_search_service:

    • literal_column: Defaults to the search index.

    • database: Defaults to the database of the specified base table.

    • schema: Defaults to the schema of the specified base table.