Manage Cortex Search costs¶
Cortex Search Services incur three types of costs:
Category |
Phase |
Description |
|
---|---|---|---|
Virtual warehouse |
Indexing |
WAREHOUSE_METERING |
Cortex Search employs a Virtual Warehouse to orchestrate embedding jobs, materialize the source query, and build the search index. You incur warehouse costs only during creation and refresh of a search service. |
EMBED_TEXT |
Indexing |
AI_SERVICES |
Behind the scenes, Cortex Search uses vector embedding for semantic search, calling EMBED_TEXT_768 or EMBED_TEXT_1024 to convert each document as a series of numbers that encodes its meaning. Embeddings are computed each time a row is inserted or updated. |
Serving |
Serving |
AI_SERVICES |
After documents have been indexed, Cortex Search establishes a low-latency, high-throughput service that serves queries against the index. You incur these costs while the service is available to respond to queries, even if no queries are served during a given period. |
This topic provides recommendations for managing these costs.
Managing indexing costs¶
- Minimize warehouse size
Most services do not see improved indexing performance beyond a LARGE warehouse and many need only MEDIUM. Most of the compute time used in building an index is consumed by the text embedding function, which does not benefit from more cores or additional memory when it already has sufficient resources.
- Suspend indexing when freshness isn’t important
Suspend indexing (or increase target lag) when you don’t need changes in your documents to be immediately propagated to the search service (that is, when freshness isn’t as important during some period)
- Set target lag according to business requirements
Not every search application requires real-time indexing. A target lag that is too low may cause your index to be refreshed more frequently than necessary. For example, if your source data updates every five minutes, but the consumer of the data only queries the search service once an hour, set the target lag to one hour, not five minutes.
- Bundle changes together
There is a fixed component to the cost of an update, so fewer, bigger updates are less expensive than more frequent, smaller updates. Likewise, any change to any value within a row triggers the search column in that row to be re-embedded, even if the data within that search column is unchanged, so it is better to accumulate all the changes to a row into a single update.
- Minimize changes to the source data
Any change to the schema of the source query causes a full refresh of the service, including vector embeddings and indexes. When you create a large service, consider including extra payload columns for later use, so you don’t need to trigger a full refresh by changing the schema when you need to add a column. The cost of the additional columns is low.
Tip
Materializing data in a table in the source query with a CREATE OR REPLACE command causes the service to fully refresh and re-embed all vectors. It’s better to update the source table incrementally (for example, with MERGE INTO).
- Keep the source query as simple as possible
Joins or other complex operations can add to indexing cost (and may be better to apply during ETL or at another stage). Refer to the Dynamic Tables Best Practices for more information on optimizing pipelines.
Managing serving costs¶
- Suspend judiciously
A running search service incurs costs even if it is not serving queries. Suspend the service when it is not needed, for example during development. It typically takes only a few minutes to resume a suspended service.
Observing costs¶
To learn more about the costs of your Cortex Search services, use the following Account Usage views.
CORTEX_SEARCH_DAILY_USAGE_HISTORY view contains daily totals for EMBED_TEXT and serving credit usage per service. Snowflake intends to also provide virtual warehouse usage in this view in the future.
CORTEX_SEARCH_SERVING_USAGE_HISTORY view includes hourly serving credits per service.
Snowflake intends to make this information available in the Cortex Search administration interface in the future.
Example¶
To view the daily consumption totals for EMBED_TEXT and serving for a given service, run the following SQL query:
SELECT USAGE_DATE, CONSUMPTION_TYPE, CREDITS FROM CORTEX_SEARCH_DAILY_USAGE_HISTORY WHERE 1=1 AND DATABASE_NAME = <database> AND SCHEMA_NAME = <schema> AND SERVICE_NAME = <cortex_search_service>
Estimating costs¶
The table below outlines a model to estimate the costs of a Cortex Search Service based on the three categories of costs described above. It is important to use assumptions for table size and other factors that closely match the data you want to make searchable.
Category |
Phase |
Service type |
Description |
---|---|---|---|
Virtual warehouse |
Indexing |
WAREHOUSE_METERING |
Varies based on the change rate of your data, target lag, and warehouse size. |
Text embeddings (EMBED_TEXT_TOKENS) |
Indexing |
AI_SERVICES |
Charged per token of text in the search column, per document, charged in accordance to the credit rate of the selected embedding model. |
Serving |
Serving |
AI_SERVICES |
6.3 credits per gigabyte-month of indexed data. |
Example¶
The data for this example is a hypothetical table with 10 million rows, We assume 10% of rows are updated each month and an additional 1% new rows are added, while no rows are removed.
Tip
To get an accurate estimate, use the COUNT_TOKENS function with a representative sample of your actual data.
Estimated initial and ongoing costs for this service can be calculated as follows.
Note
Ongoing costs are based on the volume of data provided when the search service is created. Costs can rise over time as new rows are added.
Category |
Phase |
Description |
---|---|---|
Virtual warehouse |
Indexing |
Varies based on the change rate of your data, target lag, and warehouse size. |
EMBED_TEXT |
Indexing |
Embedding for initial index creation: 0.05 credits per 1 million tokens for
snowflake-arctic-embed-l-v2.0 times 10,000,000 rows
times 500 tokens per row
divided by 1,000,000 to convert to million tokens
= 250 credits one time
Ongoing monthly embedding costs: 0.05 credits per 1 million tokens for
snowflake-arctic-embed-l-v2.0 times 1,100,000 rows added and updated per month
times 500 tokens per row
divided by 1*,000,000 to convert to million tokens
= 2.75 credits monthly
|
Serving |
Serving |
6.3 credits per gigabyte
times 10,050,000 rows (average number of rows during the month)
times 1,024 dimensions, times 4 bytes each, plus 1,000 bytes
divided by 1,000,000,000 to convert to gigabytes)
= 322.65 credits monthly
|