Optimizing cost

This topic discusses ways to optimize Snowflake to reduce costs and maximize your spend.

Cost insights: Finding opportunities for savings

Snowflake provides cost insights that identify opportunities to optimize Snowflake for cost within a particular account. These insights are calculated and refreshed weekly.

Each insight indicates how many credits or TBs could be saved by optimizing Snowflake.

Note

You must be granted the ACCOUNTADMIN role to view cost insights.

To access the Cost Insights tile:

  1. Sign in to Snowsight.

  2. Switch to the ACCOUNTADMIN role.

  3. In the navigation menu, select Admin » Cost Management.

  4. Select the Account Overview tab.

  5. Find the Cost insights tile.

Each of the following insights includes suggestions on how to optimize your spend.

Insight: Rarely used tables with automatic clustering

This insight identifies tables with automatic clustering that are queried fewer than 100 times per week by this account.

Enabling automatic clustering for a table can significantly improve the performance of queries against that table. However, as the table changes, Snowflake must use serverless compute resources to keep it in a well-clustered state. If the number of queries executed against the table are minimal, the cost incurred might not justify the performance improvements.

Recommendation: Consider disabling automatic clustering on these tables. Before you turn off automatic clustering, determine whether the table exists solely for disaster recovery purposes or for use by other Snowflake accounts through data sharing, which might explain why it isn’t accessed frequently.

For example, to disable automatic clustering for a table named t1, execute the following command:

ALTER TABLE t1 SUSPEND RECLUSTER;
Copy
Insight: Rarely used materialized views

This insight identifies materialized views that are queried fewer than 10 times per week by this account.

Creating a materialized view can significantly improve performance for certain query patterns. However, materialized views incur additional storage costs as well as serverless compute costs associated with keeping the materialized view up-to-date with new data. If the number of queries executed against the materialized view are minimal, the cost incurred might not justify the performance improvements.

Recommendation: Consider removing or suspending updates to the materialized views. Before you drop a materialized view, determine whether the table exists solely for disaster recovery purposes or for use by other Snowflake accounts through data sharing, which might explain why it isn’t accessed frequently.

For example, to delete a materialized view named mv1, execute the following command:

DROP MATERIALIZED VIEW mv1;
Copy
Insight: Rarely used search optimization paths

This insight identifies search optimization search access paths that are used fewer than 10 times per week by this account.

Search optimization uses search access paths to improve the performance of certain types of point lookup and analytical queries. Adding search optimization to a table can significantly improve performance for these queries. However, search optimization incurs additional storage costs as well as serverless compute costs associated with keeping that storage up-to-date. If the number of queries that use the search access path created by search optimization are minimal, the cost incurred might not justify the performance improvements.

Recommendation: Consider removing search optimization from the table. Before you remove search optimization, determine whether the table exists solely for disaster recovery purposes or for use by other Snowflake accounts through data sharing, which might explain why it isn’t accessed frequently.

For example, to completely remove search optimization from a table named t1, execute the following command:

ALTER TABLE t1 DROP SEARCH OPTIMIZATION;
Copy
Insight: Large tables that are never queried

This insight identifies large tables that have not been queried in the last week by this account.

Recommendation: Consider deleting unused tables, which can reduce storage costs without impacting any workloads. Before you drop the tables, determine whether the table exists solely for disaster recovery purposes or for use by other Snowflake accounts through data sharing, which might explain why it isn’t accessed frequently.

For example, to delete a table name t1, execute the following command:

DROP TABLE t1;
Copy
Insight: Tables over 100GB from which data is written but not read

This insight identifies tables where data is written but never read by this account.

Recommendation: It might be wasteful to store data and ingest new data into Snowflake if the data is never read. Consider deleting these tables to save on storage costs or stop writing new data to save on credits consumed by ingestion. Before you drop the tables, determine whether the table exists solely for disaster recovery purposes or for use by other Snowflake accounts through data sharing, which might explain why it isn’t being read.

For example, to delete a table name t1, execute the following command:

DROP TABLE t1;
Copy
Insight: Short-lived permanent tables

This insight identifies tables over 100 GB which were deleted within 24 hours of their creation.

Recommendation: If data needs to be persisted for only a short time, consider using a temporary table or transient table for future tables. Using a temporary table or transient table might help you save on Fail-safe and Time Travel costs.

For example, to create a new transient table t1, execute the following command:

CREATE TRANSIENT TABLE t1;
Copy

Optimizing cloud services for cost

If you find that your cloud services usage is higher than expected, check if your use of Snowflake follows any of the following patterns. Each pattern includes a recommendation that might help you reduce costs associated with cloud services.

Pattern: Blocked queries due to transaction locks

Update and merge commands put a transaction lock on a table, which prevents other commands from executing on that table until the lock is released. Queries consume cloud services credits as they wait for a lock to be released. The longer queries stay on the cloud services layer waiting for the lock, the more credits they consume.

Recommendation: Transaction locks often occur when users run concurrent update/merge commands on a single table, especially when each command only updates a few rows. You can minimize these locks by using a batch command rather than single updates. In this strategy, you do the following:

  1. Use a batch INSERT statement to load new data into a temporary table. Avoid using a single-row insert. Even API calls that load new data continuously should be designed in a way that they submit batch inserts rather than single-row inserts.

  2. Use the temporary table to update/merge the destination table.

If the source sends new data continuously throughout the day, consider using a task to update the destination table on a periodic basis.

Pattern: Copy commands with poor selectivity

Executing copy commands involves listing files from Amazon Simple Storage Service (S3). Because listing files uses only cloud services compute, executing copy commands with poor selectivity can result in high cloud services usage.

Recommendation: Consider changing the structure of your S3 bucket to include some kind of date prefix, so you list only the targeted files you need.

Pattern: High frequency DDL operations and cloning

Data Definition Language (DDL) operations, particularly cloning, are entirely metadata operations, meaning they use only cloud services compute. Frequently creating or dropping large schemas or tables, or cloning databases for backup, can result in significant cloud services usage.

Recommendation: Cloning uses only a fraction of the resources needed to do deep copies, so you should continue to clone. Review your cloning patterns to ensure they are as granular as possible, and are not being executed too frequently. For example, you might want to clone only individual tables rather than an entire schema.

Pattern: High frequency, simple queries

The consumption of cloud services by a single simple query is negligible, but running queries such as SELECT 1, SELECT sequence1.NEXTVAL, or SELECT CURRENT_SESSION() at an extremely high frequency (tens of thousands per day) can result in significant cloud services usage.

Recommendation: Review your query frequency and determine whether the frequency is appropriately set for your use case. If you observe a high frequency of SELECT CURRENT_SESSION() queries originating from partner tools using the JDBC driver, confirm that the partner has updated their code to use the getSessionId() method in the SnowflakeConnection interface. This takes advantage of caching and reduces cloud services usage.

Pattern: High frequency INFORMATION_SCHEMA queries

Queries against the Snowflake Information Schema consume only cloud services resources. The consumption of cloud services by a single query against INFORMATION_SCHEMA views might be negligible, but running these queries at extremely high frequency (tens of thousands per day), can result in significant cloud services usage.

Recommendation: Review your query frequency and determine whether the frequency is appropriately set for your use case. Alternatively, you can query a view in the ACCOUNT_USAGE schema instead of an INFORMATION_SCHEMA view. Querying the ACCOUNT_USAGE schema uses a virtual warehouse rather than cloud services.

Pattern: High frequency SHOW commands (by data applications and third-party tools)

SHOW commands are entirely metadata operations, meaning they consume only cloud services resources. This pattern typically occurs when you have built an application built on top of Snowflake that executes SHOW commands at a high frequency. These commands might also be initiated by third-party tools.

Recommendation: Review your query frequency and determine whether the frequency is appropriately set for your use case. In the case of partner tools, reach out to your partner to see if they have any plans to adjust their usage.

Pattern: Single row inserts and fragmented schemas (by data applications)

Snowflake is not an OLTP system, so single row inserts are suboptimal, and can consume significant cloud services resources.

Building a data application that defines one schema per customer might result in several data loads in a given time period, which can result in high cloud services consumption.

This pattern also results in a lot more metadata that Snowflake needs to maintain, and metadata operations consume cloud services resources. Each metadata operation individually consumes minimal resources, but consumption might be significant in aggregate.

Recommendation: In general, do batch or bulk loads rather than single row inserts.

Using a shared schema is significantly more efficient, which saves costs. You’ll likely want to cluster all tables on customer_ID and use secure views.

Pattern: Complex SQL queries

Queries can consume significant cloud services compute if they include a lot of joins/Cartesian products, use the IN operator with large lists, or are very large queries. These types of queries all have high compilation times.

Recommendation: Review your queries to confirm they are doing what you intend them to do. Snowflake supports these queries and will charge you only for the resources consumed.