Working with Materialized Views¶
A materialized view is a pre-computed data set derived from a query specification (the SELECT in the view definition) and stored for later use. Because the data is pre-computed, querying a materialized view is faster than executing a query against the base table of the view. This performance difference can be significant when a query is run frequently or is sufficiently complex. As a result, materialized views can speed up expensive aggregation, projection, and selection operations, especially those that run frequently and that run on large data sets.
Materialized views are designed to improve query performance for workloads composed of common, repeated query patterns. However, materializing intermediate results incurs additional costs. As such, before creating any materialized views, you should consider whether the costs are offset by the savings from re-using these results frequently enough.
In this Topic:
Deciding When to Create a Materialized View¶
Materialized views are particularly useful when:
Query results contain a small number of rows and/or columns relative to the base table (the table on which the view is defined).
Query results contain results that require significant processing, including:
Analysis of semi-structured data.
Aggregates that take a long time to calculate.
The query is on an external table (i.e. data sets stored in files in an external stage), which might have slower performance compared to querying native database tables.
The view’s base table does not change frequently.
Advantages of Materialized Views¶
Snowflake’s implementation of materialized views provides a number of unique characteristics:
Materialized views can improve the performance of queries that use the same subquery results repeatedly.
Materialized views are automatically and transparently maintained by Snowflake. A background service updates the materialized view after changes are made to the base table. This is more efficient and less error-prone than manually maintaining the equivalent of a materialized view at the application level.
Data accessed through materialized views is always current, regardless of the amount of DML that has been performed on the base table. If a query is run before the materialized view is up-to-date, Snowflake either updates the materialized view or uses the up-to-date portions of the materialized view and retrieves any required newer data from the base table.
The automatic maintenance of materialized views consumes credits. For more details, see Maintenance Costs for Materialized Views (in this topic).
Deciding When to Create a Materialized View or a Regular View¶
In general, when deciding whether to create a materialized view or a regular view, use the following criteria:
Create a materialized view when all of the following are true:
The query results from the view don’t change often. This almost always means that the underlying/base table for the view doesn’t change often, or at least that the subset of base table rows used in the materialized view don’t change often.
The results of the view are used often (typically significantly more often than the query results change).
The query consumes a lot of resources. Typically, this means that the query consumes a lot of processing time or credits, but it could also mean that the query consumes a lot of storage space for intermediate results.
Create a regular view when any of the following are true:
The results of the view change often.
The results are not used often (relative to the rate at which the results change).
The query is not resource intensive so it is not costly to re-run it.
These criteria are just guidelines. A materialized view might provide benefits even if it is not used often — especially if the results change less frequently than the usage of the view.
Also, there are other factors to consider when deciding whether to use a regular view or a materialized view.
For example, the cost of storing the materialized view is a factor; if the results are not used very often (even if they are used more often than they change), then the additional storage costs might not be worth the performance gain.
Comparison with Tables, Regular Views, and Cached Results¶
Materialized views are similar to tables in some ways and similar to regular (i.e. non-materialized) views in other ways. In addition, materialized views have some similarities with cached results, particularly because both enable storing query results for future re-use.
This section describes some of the similarities and differences between these objects in specific areas, including:
Reduced query logic complexity.
Data clustering (related to query performance).
Storage and maintenance costs.
Snowflake caches query results for a short period of time after a query has been run. In some situations, if the same query is re-run and if nothing has changed in the table(s) that the query accesses, then Snowflake can simply return the same results without re-running the query. This is the fastest and most efficient form of re-use, but also the least flexible. For more details, see Using Persisted Query Results.
Both materialized views and cached query results provide query performance benefits:
Materialized views are more flexible than, but typically slower than, cached results.
Materialized views are faster than tables because of their “cache” (i.e. the query results for the view); in addition, if data has changed, they can use their “cache” for data that hasn’t changed and use the base table for any data that has changed.
Regular views do not cache data, and therefore cannot improve performance by caching. However, in some cases, views help Snowflake generate a more efficient query plan. Also, both materialized views and regular views enhance data security by allowing data to be exposed or hidden at the row level or column level.
The following table shows key similarities and differences between tables, regular views, cached query results, and materialized views:
Simplifies Query Logic
Uses Credits for Maintenance
Cached query result
Used only if data has not changed and if query only uses deterministic functions (e.g. not CURRENT_DATE).
Storage and maintenance requirements typically result in increased costs.
Examples of Use Cases For Materialized Views¶
This section describes some general usage scenarios that also provide a conceptual overview of materialized views:
Suppose that, every day, you run a query
Qthat includes a subquery
Sis resource-intensive and queries data that changes only once a week, then you could improve performance of the outer query
Sand caching the results in a table named
You would update the table only once a week.
The rest of the time, when you run
Q, it would reference the subquery results of
Sthat were stored in the table.
This would work well as long as the results of subquery
Schange predictably (e.g. at the same time every week).
However, if the results of
Schange unpredictably then caching the results in a table is risky; sometimes your main query
Qwill return out-of-date results if the results of subquery
Sare out of date (and thus the results of cached table
CTare out of date).
Ideally, you’d like a special type of cache for results that change rarely, but for which the timing of the change is unpredictable. Looking at it another way, you’d like to force your subquery
Sto be re-run (and your cache table
CTto be updated) when necessary.
A materialized view implements an approximation of the best of both worlds. You define a query for your materialized view, and the results of the query are cached (as though they were stored in an internal table), but Snowflake updates the cache when the table that the materialized view is defined on is updated. Thus, your subquery results are readily available for fast performance.
As a less abstract example, suppose that you run a small branch of a large pharmacy, and your branch stocks hundreds of medications out of a total of tens of thousands of FDA-approved medications.
Suppose also that you have a complete list of all medications that each of your customers takes, and that almost all of those customers order only medicines that are in stock (i.e. special orders are rare).
In this scenario, you could create a materialized view that lists only the interactions among medicines that you keep in stock. When a customer orders a medicine that she has never used before, if both that medicine and all of the other medicines that she takes are covered by your materialized view, then you don’t need to check the entire FDA database for drug interactions; you can just check the materialized view, so the search is faster.
You can use a materialized view by itself, or you can use it in a join.
Continuing with the pharmacy example, suppose that you have one table that lists all of the medicines that each of your customers takes; you can join that table to the materialized view of drug interactions to find out which of the customer’s current medications might interact with the new medication.
You might use an outer join to make sure that you list all of the customer’s medicines, whether or not they are in your materialized view; if the outer join shows that any of the current medicines are not in the materialized view, you can re-run the query on the full drug interactions table.
How the Query Optimizer Uses Materialized Views¶
You don’t need to specify a materialized view in an SQL statement in order for the view to be used. The query optimizer can automatically rewrite queries against the base table to use the materialized view instead. (The optimzer can also rewrite queries to use regular views instead of the base table.)
For example, suppose that a materialized view contains all of the rows and columns that are needed by a query against a base table. The optimizer can decide to rewrite the query to use the materialized view, rather than the base table. This can dramatically speed up a query, especially if the base table contains a large amount of historical data.
As another example, in a multi-table join, the optimizer might decide to use a materialized view instead of a table for one of the tables in the join.
Even if a materialized view can replace the base table in a particular query, the optimizer might not use the materialized view. For example, if the base table is clustered by a field, the optimizer might choose to scan the base table (rather than the materialized view) because the optimizer can effectively prune out partitions and provide equivalent performance using the base table.
A materialized view can also be used as the data source for a subquery.
When the optimizer chooses to use a materialized view implicitly, the materialized view is listed in the EXPLAIN plan or the Query Profile instead of the base table. You can use this information to experiment and understand which queries can benefit from existing materialized views.
About Materialized Views in the Snowflake Database¶
The next sections explain how materialized views are represented in the Snowflake database.
DDL Commands For Materialized Views¶
Materialized views are first-class database objects. Snowflake provides the following DDL commands for creating and maintaining materialized views:
DML Operations on Materialized Views¶
Snowflake does not allow standard DML (e.g. INSERT, UPDATE, DELETE) on materialized views. Snowflake does not allow users to truncate materialized views.
See Limitations on Working With Materialized Views (in this topic) for details.
Access Control Privileges¶
There are three types of privileges that are related to materialized views:
Privileges on the schema that contains the materialized view.
Privileges directly on the materialized view itself.
Privileges on the database objects (e.g. tables) that the materialized view accesses.
You can use the standard commands for granting and revoking privileges on materialized views:
Privileges on a Materialized View’s Schema¶
Materialized views consume storage space. To create a materialized view, you need the CREATE MATERIALIZED VIEW privilege on the schema that will contain the materialized view. You’ll need to execute a statement similar to:
GRANT CREATE MATERIALIZED VIEW ON SCHEMA <schema_name> TO ROLE <role_name>;
For more details about the GRANT statement, see GRANT <privileges> … TO ROLE.
Privileges on a Materialized View¶
Materialized Views, like other database objects (tables, views, UDFs, etc.), are owned by a role and have privileges that can be granted to other roles.
You can grant the following privileges on a materialized view:
As with non-materialized views, a materialized view does not automatically inherit the privileges of its base table. You should explicitly grant privileges on the materialized view to the roles that should use that view.
The exception to this rule is when the query optimizer rewrites a query against the base table to use the materialized view (as explained in How the Query Optimizer Uses Materialized Views). In this case, the user does not need privileges to use the materialized view in order to access the results of the query.
Privileges on the Database Objects Accessed by the Materialized View¶
As with non-materialized views, a user who wishes to access a materialized view needs privileges only on the view, not on the underlying object(s) that the view references.
Secure Materialized Views¶
Materialized views can be secure views.
Most information about secure views applies to secure materialized views. There are a few cases where secure materialized views are different from secure non-materialized views. The differences include:
The command to find out whether a view is secure.
For non-materialized views, check the
IS_SECUREcolumn in the output of the
For materialized views, check the
IS_SECUREcolumn in the output of the
SHOW MATERIALIZED VIEWScommand.
For more information about secure views, see Working with Secure Views.
The syntax to create secure materialized views is documented at CREATE MATERIALIZED VIEW.
Creating and Working With Materialized Views¶
This section provides information about creating and working with materialized views.
Planning to Create a Materialized View¶
When deciding to create a materialized view, consider doing some analysis to determine the need for the view:
Examine the filters, projections, and aggregations of queries that are frequent or expensive.
Use the Query Profile and the EXPLAIN command to see whether existing materialized views are already being used by the automatic query rewrite feature. You might find that you do not need to create any new materialized views if there are existing views that fit the queries well.
Before adding any materialized views, record current query costs and performance so that you can evaluate the difference after creating the new materialized view.
If you find queries with very selective filters that do not benefit from clustering the table, then a materialized view containing the same filters can help the queries avoid scanning a lot of data.
Similarly, if you find queries that use aggregation, or that contain expressions that are very expensive to evaluate (for example, expensive function calls, or expensive operations on semi-structured data), then a materialized views that uses the same expression(s) or aggregation(s) can provide a benefit.
Run the EXPLAIN command against the original queries, or run the queries and check the Query Profile, to see whether the new materialized view is being used.
Monitor the combined query and materialized view costs, and evaluate whether the performance or cost benefits justify the cost of the materialized view’s maintenance.
Examine the query costs of the base table as well. In cases where the optimizer can rewrite the query to use a materialized view, query compilation can consume more time and resources. (The optimizer has a larger number of possibilities to consider.)
Remember that you can always reference materialized views directly if it simplifies your queries or you know that a materialized view will give you better performance. However, in most cases, you can simply query the base table and the automatic query rewrite feature will do that for you.
Creating a Materialized View¶
Note the following:
Whenever possible, use the fully-qualified name for the base table referenced in a materialized view. This insulates the view from changes that can invalidate the view, such as moving the base table to a different schema from the view (or vice versa).
If the name of the base table is not qualified, and the table or view is moved to a different schema, the reference becomes invalid.
If you specify a filter when creating a materialized view (e.g.
WHERE column_1 BETWEEN Y and Z), the optimizer can use the materialized view for queries against the base table that have the same filter or a more restrictive filter. Here are some examples:
Here’s a simple example of range subsumption. In this example, the filter in the query does not match the filter in the materialized view. However, the filter in the query selects only rows that are in the materialized view, so the optimizer can choose to scan the materialized view rather than the entire table.
-- Example of a materialized view with a range filter create materialized view v1 as select * from table1 where column_1 between 100 and 400;
-- Example of a query that might be rewritten to use the materialized view select * from table1 where column_1 between 200 and 300;
This example shows OR subsumption. The materialized view contains all the rows that the subsequent query needs.
Define a materialized view that contains all rows that have either value X or value Y:
create materialized view mv1 as select * from tab1 where column_1 = X or column_1 = Y;
Define a query that looks only for value Y (which is included in the materialized view):
select * from tab1 where column_1 = Y;
The query above can be rewritten internally as:
select * from mv1 where column_1 = Y;
This example is another example of OR subsumption. There’s no explicit
ORin the materialized view definition. However, an
INclause is equivalent to a series of
ORexpressions, so the optimizer can re-write this query the same way as it re-wrote the OR subsumption example above:
create materialized view mv1 as select * from tab1 where column_1 in (X, Y);
Define a query that looks only for value Y (which is included in the materialized view):
select * from tab1 where column_1 = Y;
The query above can be rewritten internally as:
select * from mv1 where column_1 = Y;
This example uses AND subsumption:
Create a materialized view that contains all rows where
column_1 = X.
create materialized view mv2 as select * from table1 where column_1 = X;
Create a query:
select column_1, column_2 from table1 where column_1 = X AND column_2 = Y;
The query can be rewritten as:
select * from mv2 where column_2 = Y;
The rewritten query does not even need to include the expression
column_1 = Xbecause the materialized view’s definition already requires that all rows match
column_1 = X.
The following example shows aggregate subsumption:
The materialized view is defined below:
create materialized view mv4 as select column_1, column_2, sum(column_3) from table1 group by column_1, column_2;
The following query can use the materialized view defined above:
select column_1, sum(column_3) from table1 group by column_1;
The query can be rewritten as:
select column_1, sum(column_3) from mv4 group by column_1;
The rewritten query does not take advantage of the additional grouping by column_2, but the rewritten query is not blocked by that additional grouping, either.
The CREATE MATERIALIZED VIEW statement might take a substantial amount of time to complete.
When a materialized view is first created, Snowflake performs the equivalent of a CTAS (CREATE TABLE … AS ….) operation.
Limitations on Creating Materialized Views¶
These are current limitations; some of them might be removed or changed in future versions.
The following limitations apply to creating materialized views:
A materialized view can query only a single table.
Joins, including self-joins, are not supported.
A materialized view cannot query:
A materialized view.
A non-materialized view.
A UDTF (user-defined table function).
A materialized view cannot include:
ORDER BY clause.
GROUP BY keys that are not within the SELECT list. All GROUP BY keys in a materialized view must be part of the SELECT list.
GROUP BY GROUPING SETS.
GROUP BY ROLLUP.
GROUP BY CUBE.
Nesting of subqueries within a materialized view.
Many aggregate functions are not allowed in a materialized view definition.
The aggregate functions that are supported in materialized views are:
The other aggregate functions are not supported in materialized views.
Aggregate functions that are allowed in materialized views still have some restrictions:
Aggregate functions cannot be nested.
Aggregate functions used in complex expressions (e.g.
(sum(salary)/10)) can only be used in the outer-most level of a query, not in a subquery or an in-line view.
For example, the following is allowed:
create materialized view mv1 as select sum(x) + 100 from t;
The following is not allowed:
create materialized view mv2 as select y + 10 from ( select sum(x) as y from t );
DISTINCT cannot be combined with aggregate functions.
In a materialized view, the aggregate functions
SUMcan be used as aggregate functions but not as window functions. In a materialized view, these functions cannot be used with the
OVER ( [ PARTITION BY <expr1> ] [ ORDER BY <expr2> ] )
If an aggregate function is in a subquery, the materialized view cannot create an expression on top of the aggregated column(s) from that subquery. For example, consider the following materialized view definition:
create or replace materialized view mv1 as select c1 + 10 as c1new, c2 from (select sum(c1) as c1, c2 from t group by c2);
The expression “c1 + 10” is an expression on top of an aggregate function in a subquery, and therefore causes an error message.
Note that even an equality operator counts as an expression, which means that
CASEexpressions using columns that represent aggregate functions in a subquery are also prohibited.
To work around this limitation, create a materialized view without the expression, and then create a non-materialized view that includes the expression, for example:
create or replace materialized view mv1 as select c1, c2 from (select sum(c1) as c1, c2 from t group by c2); create or replace view expr_v1 as select c1 + 10 as c1new, c2 from (select c1, c2 from mv1);
A materialized view should not be defined using a function that produces different results for different settings of parameters, such as the session-level parameter TIMESTAMP_TYPE_MAPPING.
For example, suppose that a view is defined as follows:
create materialized view bad_example (ts1) as select to_timestamp(n) from t1;
The data type of the return value from
TO_TIMESTAMP(n)depends upon the parameter TIMESTAMP_TYPE_MAPPING, so the contents of the materialized view depend upon the value of TIMESTAMP_TYPE_MAPPING at the time that the view was created.
When a materialized view is created, the expression defining each of its columns is evaluated and stored. If a column definition depends upon a particular session variable, and the session variable changes, the expression is not re-evaluated, and the materialized view is not updated. If the materialized view depends upon a particular value of a session variable, and if the session variable’s value has changed, then queries on the materialized view fail.
To avoid this problem, force the expression to a value that does not depend upon any session variables. The example below casts the output to a particular data type, independent of the TIMESTAMP_TYPE_MAPPING parameter:
create materialized view good_example (ts1) as select to_timestamp(n)::TIMESTAMP_NTZ from t1;
This issue is specific to materialized views. Non-materialized views generate their output dynamically based on current parameter settings, so the results can’t be stale.
Snowflake’s “Time Travel” feature is not supported on materialized views.
Basic Example: Creating a Materialized View¶
This section contains a basic example of creating and using a materialized view:
CREATE OR REPLACE MATERIALIZED VIEW mv1 AS SELECT My_ResourceIntensive_Function(binary_col) FROM table1; SELECT * FROM mv1;
More detailed examples are provided in Examples (in this topic).
Working With a Materialized View¶
After you create a materialized view, a background process automatically maintains the data in the materialzed view. Note the following:
Maintenance of materialized views is performed by a background process and the timing is not predictable by the user. If maintenance falls behind, queries might run more slowly than they run when the views are up-to-date. However, the results will always be correct; if some micro-partitions of the materialized view are out of date, Snowflake skips those partitions and looks up data in the base table.
To see the last time that a materialized view was refreshed, check the
BEHIND_BYcolumns in the output of the command
SHOW MATERIALIZED VIEWS.
If you suspend maintenance of a view, you should not query the view until you resume maintenance.
The following command and view provide information about materialized views:
The SHOW VIEWS command returns information about both materialized and regular views.
INFORMATION_SCHEMA.TABLES shows materialized views. The
TABLE_TYPEcolumn shows “MATERIALIZED VIEW”. The
IS_INSERTABLEcolumn is always “NO”, because you cannot insert directly into a materialized view.
INFORMATION_SCHEMA.VIEWS does not show materialized views. Materialized views are shown by INFORMATION_SCHEMA.TABLES.
Limitations on Working With Materialized Views¶
These are current limitations; some of them might be removed or changed in future versions.
The following limitations apply to using materialized views:
To ensure that materialized views stay consistent with the base table on which they are defined, you cannot perform most DML operations on a materialized view itself. For example, you cannot insert rows directly into a materialized view (although of course you can insert rows into the base table). The prohibited DML operations include:
Truncating a materialized view is not supported.
You cannot directly clone a materialized view by using the
CREATE VIEW ... CLONE...command. However, if you clone a schema or a database that contains a materialized view, the materialized view will be cloned and included in the new schema or database.
Time Travel is not currently supported on materialized views.
Materialized Views are not monitored by Snowflake Working with Resource Monitors.
Effects of Changes to Base Tables on Materialized Views¶
In Snowflake, changes to a base table are not automatically propagated to materialized views based on that table.
If columns are added to the base table, those new columns are not automatically propagated to the materialized views. This is true even if the materialized view was defined with
CREATE MATERIALIZED VIEW AS SELECT * FROM table2 ...). The columns of the materialized view are defined at the time that the materialized view is defined; the
SELECT *is not interpreted dynamically each time that the materialized view is queried. To avoid confusion, Snowflake recommends not using
SELECT *in the definition of a materialized view.
If a base table is altered so that existing columns are changed or dropped, then all materialized views on that base table are suspended; the materialized views cannot be used or maintained. (This is true even if the modified or dropped column was not part of the materialized view.) You cannot RESUME that materialized view. If you want to use it again, you must re-create it. Although you can drop it, create a new view, and then run GRANT and REVOKE commands to re-create the privileges on the view, the most efficient way to re-create the view is usually to replace it in a single command by executing
CREATE OR REPLACE VIEW <view_name> ... COPY GRANTS ....
ALTER TABLE ... ADD <column>does not suspend a materialized view created on that base table.
If a base table is dropped, the materialized view is suspended (but not automatically dropped). In most cases, the materialized view must be dropped. If for some reason you are re-creating the base table and would also like to re-create the materialized view with the same definition it had previously, then first re-create the base table, and then after replace the view by using
CREATE OR REPLACE VIEW <view_name> ... COPY GRANTS ....
Materialized Views in Cloned Schemas and Databases¶
If you clone a schema or a database that contains a materialized view, then the materialized view is cloned.
If you clone the materialized view and the corresponding base table at the same time (as part of the same
CREATE SCHEMA ... CLONE or
CREATE DATABASE ... CLONE operation), then the cloned materialized view
refers to the cloned base table.
If you clone the materialized view without cloning the base table (e.g. if the table is in Database1.Schema1 and the view is in Database1.Schema2, and you clone only Schema2 rather than all of Database1), then the cloned view will refer to the original base table.
Cloning only the materialized view can be useful if different users want to “tune” the view differently. For example, you can cluster the cloned view on a different column(s) than you clustered the original view.
Maintenance Costs for Materialized Views¶
Materialized views impact your costs for both storage and compute resources:
Storage: Each materialized view stores query results, which adds to the monthly storage usage for your account.
Compute resources: In order to prevent materialized views from becoming out-of-date, Snowflake performs automatic background maintenance of materialized views. When a base table changes, all materialized views defined on the table are updated by a background service that uses compute resources provided by Snowflake.
These updates can consume significant resources, resulting in increased credit usage. However, Snowflake ensures efficient credit usage by billing your account only for the actual resources used. Billing is calculated in 1-second increments.
Estimating and Controlling Costs¶
There are no tools to estimate the costs of maintaining materialized views. In general, the costs are proportional to:
The number of materialized views created on each base table, and the amount of data that changes in each of those materialized views when the base table changes. Any changes to micro-partitions in the base table require eventual materialized view maintenance, whether those changes are due to reclustering or DML statements run on the base table.
The number of those materialized views that are clustered. Maintaining clustering (of either a table or a materialized view) adds costs.
If a materialized view is clustered differently from the base table, the number of micropartitions changed in the materialized view might be substantially larger than the number of micropartitions changed in the base table.
For example, consider the case where the base table is changed largely by inserting (appending) data, and is not clustered, so the base table is largely in the order that the rows were inserted into the table. Imagine that the materialized view is clustered by an independent column, for example, postal code. If 100 new rows are added to the base table, those might go into one or two new micropartitions, leaving the other micropartitions in the base table untouched. But those 100 rows might require re-writing 100 micropartitions in the clustered materialized view.
As another example, consider deletes. Deleting the oldest rows in an unclustered base table might delete only the oldest micropartitions, but might require changes to a far larger number of micropartitions in a materialized view that is not clustered by age.
(For more details about clustering materialized views, see Materialized Views and Clustering.)
You can control the cost of maintaining materialized views by carefully choosing how many views to create, which tables to create them on, and each view’s definition (including the number of rows and columns in that view).
You can also control costs by suspending or resuming the materialized view; however, suspending maintenance typically only defers costs, rather than reducing them. The longer that maintenance has been deferred, the more maintenance there is to do.
If you are concerned about the cost of maintaining materialized views, Snowflake recommends starting slowly with this feature (i.e. create only a few materialized views on selected tables) and monitor the costs over time.
You can view the billing costs for maintaining materialized views using either the web interface or SQL:
- Web Interface
Query either of the following:
SELECT * FROM TABLE(INFORMATION_SCHEMA.MATERIALIZED_VIEW_REFRESH_HISTORY());
Resource monitors provide control over virtual warehouse credit usage; however, you cannot use them to control credit usage for the Snowflake-provided warehouses, including the MATERIALIZED_VIEW_MAINTENANCE warehouse.
Materialized Views and Clustering¶
Defining a clustering key on a materialized view is supported and can increase performance in many situations. However, it also adds costs.
If you cluster both the materialized view(s) and the base table on which the materialized view(s) are defined, you can cluster the materialized view(s) on different columns from the columns used to cluster the base table.
In most cases, clustering a subset of the materialized views on a table tends to be more cost-effective than clustering the table itself. If the data in the base table is accessed (almost) exclusively through the materialized views, and (almost) never directly through the base table, then clustering the base table adds costs without adding benefit.
If you are considering clustering both the base table and the materialized views, Snowflake recommends that you start by clustering only the materialized views, and that you monitor performance and cost before and after adding clustering to the base table.
If you plan to create a table, load it, and create a clustered materialized view(s) on the table, then Snowflake recommends that you create the materialized views last (after loading as much data as possible). This can save money on the initial data load, because it avoids some extra effort to maintain the clustering of the materialized view the first time that the materialized view is loaded.
Best Practices for Materialized Views¶
Best Practices for Creating Materialized Views¶
Most materialized views should do one or both of the following:
Filter data. You can do this by:
Filtering rows (e.g. defining the materialized view so that only very recent data is included). In some applications, the best data to store is the abnormal data. For example, if you are monitoring pressure in a gas pipeline to estimate when pipes might fail, you might store all pressure data in the base table, and store only unusually high pressure measurements in the materialized view. Similarly, if you are monitoring network traffic, your base table might store all monitoring information, while your materialized view might store only unusual and suspicious information (e.g. from IP addresses known to launch DOS (Denial Of Service) attacks).
Filtering columns (e.g. selecting specific columns rather than “SELECT * …”). Using
SELECT * ...to define a materialized view typically is expensive. It can also lead to future errors; if columns are added to the base table later (e.g.
ALTER TABLE ... ADD COLUMN ...), the materialized view does not automatically incorporate the new columns.
Perform resource-intensive operations and store the results so that the resource intensive operations don’t need to be performed as often.
You can create more than one materialized view for the same base table. For example, you can create one materialized view that contains just the most recent data, and another materialized view that stores unusual data. You can then create a non-materialized view that joins the two tables and shows recent data that matches unusual historical data so that you can quickly detect unusual situations, such as a DOS (denial of service) attack that is ramping up.
Snowflake recommends materialized views for unusual data only when:
The base table is not clustered, or the columns that contain the unusual data are not already part of the base table’s clustering key.
The data is unusual enough that it is easy to isolate, but not so unusual that it is rarely used. (If the data is rarely used, the cost of maintaining the materialized view is likely to outweigh the performance benefit and cost savings from being able to access it quickly when it is used.)
Best Practices for Maintaining Materialized Views¶
Snowflake recommends batching DML operations on the base table:
DELETE: If tables store data for the most recent time period (e.g. the most recent day or week or month), then when you trim your base table by deleting old data, the changes to the base table are propagated to the materialized view. Depending upon how the data is distributed across the micro-partitions, this could cause you to pay more for background updates of the materialized views. In some cases, you might be able to reduce costs by deleting less frequently (e.g. daily rather than hourly, or hourly rather than every 10 minutes).
If you do not need to keep a specific amount of old data, you should experiment to find the best balance between cost and functionality.
MERGE: Batching these types of DML statements on the base table can reduce the cost of maintaining the materialized views.
Best Practices for Clustering Materialized Views and their Base Tables¶
If you create a materialized view on a base table, and if the materialized views are accessed frequently and the base table is not accessed frequently, it is usually more efficient to avoid clustering the base table. If you create a materialized view on a clustered table, consider removing any clustering on the base table, because any change to the clustering of the base table will eventually require a refresh of the materialized view, which adds to the materialized view’s maintenance costs.
Clustering materialized views, especially materialized views on base tables that change frequently, increases costs. Do not cluster more materialized views than you need to.
Almost all information about clustering tables also applies to clustering materialized views. For more information about clustering tables, see Strategies for Selecting Clustering Keys.
This section contains additional examples of creating and using materialized views. For a simple, introductory example, see Basic Example: Creating a Materialized View (in this topic).
Simple Materialized View¶
This first example illustrates a simple materialized view and a simple query on the view.
Create the table and load the data, and create the view:CREATE TABLE inventory (product_ID INTEGER, wholesale_price FLOAT, description VARCHAR); CREATE OR REPLACE MATERIALIZED VIEW mv1 AS SELECT product_ID, wholesale_price FROM inventory; INSERT INTO inventory (product_ID, wholesale_price, description) VALUES (1, 1.00, 'cog');
Select data from the view:SELECT product_ID, wholesale_price FROM mv1; +------------+-----------------+ | PRODUCT_ID | WHOLESALE_PRICE | |------------+-----------------| | 1 | 1 | +------------+-----------------+
Joining a Materialized View¶
You can join a materialized view with a table or another view. This example builds on the previous example by creating an additional table, and then a non-materialized view that shows profits by joining the materialized view to a table:
CREATE TABLE sales (product_ID INTEGER, quantity INTEGER, price FLOAT); INSERT INTO sales (product_ID, quantity, price) VALUES (1, 1, 1.99); CREATE or replace VIEW profits AS SELECT m.product_ID, SUM(IFNULL(s.quantity, 0)) AS quantity, SUM(IFNULL(quantity * (s.price - m.wholesale_price), 0)) AS profit FROM mv1 AS m LEFT OUTER JOIN sales AS s ON s.product_ID = m.product_ID GROUP BY m.product_ID;
Select data from the view:SELECT * FROM profits; +------------+----------+--------+ | PRODUCT_ID | QUANTITY | PROFIT | |------------+----------+--------| | 1 | 1 | 0.99 | +------------+----------+--------+
Suspending Updates to a Materialized View¶
The following example temporarily suspends the use (and maintenance)
mv1 materialized view, and shows that queries on that view
will generate an error message while the materialized view is suspended:
ALTER MATERIALIZED VIEW mv1 SUSPEND; INSERT INTO inventory (product_ID, wholesale_price, description) VALUES (2, 2.00, 'sprocket'); INSERT INTO sales (product_ID, quantity, price) VALUES (2, 10, 2.99), (2, 1, 2.99);
Select data from the materialized view:SELECT * FROM profits ORDER BY product_ID;
Output:002037 (42601): SQL compilation error: Failure during expansion of view 'PROFITS': SQL compilation error: Failure during expansion of view 'MV1': SQL compilation error: Materialized View MV1 is invalid.
Resume:ALTER MATERIALIZED VIEW mv1 RESUME;
Select data from the materialized view:SELECT * FROM profits ORDER BY product_ID; +------------+----------+--------+ | PRODUCT_ID | QUANTITY | PROFIT | |------------+----------+--------| | 1 | 1 | 0.99 | | 2 | 11 | 10.89 | +------------+----------+--------+
Clustering a Materialized View¶
This example creates a materialized view and then later clusters it:
These statements create two tables that track information about segments of a pipeline (e.g. for natural gas).
The segments that are most likely to fail in the near future are often the segments that are oldest, or that are made of materials that corrode easily, or that had experienced periods of unusually high pressure, so this example tracks each pipe’s age, pressure, and material (iron, copper, PVC plastic, etc.).CREATE TABLE pipeline_segments ( segment_ID BIGINT, material VARCHAR, -- e.g. copper, cast iron, PVC. installation_year DATE, -- older pipes are more likely to be corroded. rated_pressure FLOAT -- maximum recommended pressure at installation time. ); INSERT INTO pipeline_segments (segment_ID, material, installation_year, rated_pressure) VALUES (1, 'PVC', '1994-01-01'::DATE, 60), (2, 'cast iron', '1950-01-01'::DATE, 120) ; CREATE TABLE pipeline_pressures ( segment_ID BIGINT, pressure_psi FLOAT, -- pressure in Pounds per Square Inch measurement_timestamp TIMESTAMP ); INSERT INTO pipeline_pressures (segment_ID, pressure_psi, measurement_timestamp) VALUES (2, 10, '2018-09-01 00:01:00'), (2, 95, '2018-09-01 00:02:00') ;
The pipeline segments don’t change very frequently, and the oldest pipeline segments are the segments most likely to fail, so create a materialized view of the oldest segments.CREATE MATERIALIZED VIEW vulnerable_pipes (segment_ID, installation_year, rated_pressure) AS SELECT segment_ID, installation_year, rated_pressure FROM pipeline_segments WHERE material = 'cast iron' AND installation_year < '1980'::DATE;
You can add clustering or change the clustering key. For example, to cluster on
installation_year:ALTER MATERIALIZED VIEW vulnerable_pipes CLUSTER BY (installation_year);
New pressure measurements arrive frequently (perhaps every 10 seconds), so maintaining a materialized view on the pressure measurements would be expensive. Therefore, even though high performance (fast retrieval) of recent pressure data is important, the
pipeline_pressurestable starts without a materialized view.
If performance is too slow, you can create a materialized view that contains only recent pressure data, or that contains data only about abnormal high-pressure events.
Create a (non-materialized) view that combines information from the materialized view and the
pipeline_pressurestable:CREATE VIEW high_risk AS SELECT seg.segment_ID, installation_year, measurement_timestamp::DATE AS measurement_date, DATEDIFF('YEAR', installation_year::DATE, measurement_timestamp::DATE) AS age, rated_pressure - age AS safe_pressure, pressure_psi AS actual_pressure FROM vulnerable_pipes AS seg INNER JOIN pipeline_pressures AS psi ON psi.segment_ID = seg.segment_ID WHERE pressure_psi > safe_pressure ;
Now list the high-risk pipeline segments:SELECT * FROM high_risk; +------------+-------------------+------------------+-----+---------------+-----------------+ | SEGMENT_ID | INSTALLATION_YEAR | MEASUREMENT_DATE | AGE | SAFE_PRESSURE | ACTUAL_PRESSURE | |------------+-------------------+------------------+-----+---------------+-----------------| | 2 | 1950-01-01 | 2018-09-01 | 68 | 52 | 95 | +------------+-------------------+------------------+-----+---------------+-----------------+
This shows that the pipeline segment with
segment_id = 2, which is made of a material that corrodes, is old. This segment has never experienced pressure higher than the maximum pressure rating at the time it was installed, but because of the potential for corrosion, its “safe limit” has declined over time, and the highest pressure it has experienced is higher than the pressure that was recommended for a pipe as old as the pipe was at the time of the pressure measurement.
Failure during expansion of view '<name>': SQL compilation error: Materialized View <name> is invalid.¶
In many cases, this is caused by a change to the underlying table that the materialized view is based on. For example, this error is returned if the table is dropped or if the materialized view refers to a table column, but the column has been dropped.
If the table has been dropped and is not going to be re-created, then you probably should drop the view.
If the table has been modified, but still exists, you might be able to drop and re-create the materialized view, using the columns that remain.
SHOW MATERIALIZED VIEWS Command Shows Materialized Views that are Not Updated¶
One possible cause is that the refresh failed because the SELECT in the view definition failed.
SELECTfails during the refresh, then the refresh will fail; however, because the refresh is done behind the scenes, the user will not see an error message at the time the refresh is attempted.
If some or all of the data in the materialized view is out of date, then Snowflake will retrieve up-to-date data from the base table; Snowflake will not issue an error message that the materialized view was not refreshed.
Therefore, neither the refresh nor subsequent queries necessarily shows that the
SELECTin the view failed. To detect whether refreshes are failing, use the command SHOW MATERIALIZED VIEWS and look for the column named
refreshed_on. If the data is not getting refreshed, then the
SELECTmight be failing.
Make sure that the underlying table exists. If you drop the table that the view is defined on, but you don’t drop the view, then the view will continue to exist.
In some cases, you might be able to debug the problem by manually running the
SELECTstatement in the materialized view’s definition, or by running a simpler (less expensive)
SELECTon the table referenced in the materialized view’s definition.