Introduction to data quality and data metric functions¶
Data quality uses data metric functions (DMFs), which include Snowflake-provided system DMFs and user-defined DMFs, to monitor the state and integrity of your data. You can use DMFs to measure key metrics, such as, but not limited to, freshness and counts that measure duplicates, NULLs, rows, and unique values.
About data quality and DMFs¶
Data quality focuses on knowing the state and integrity of your data, which includes data freshness and accuracy with respect to true data values compared to null values or blank fields in a column, to make data-driven decisions. You can measure the quality of your data by using DMFs. Snowflake provides built-in system DMFs in the SNOWFLAKE.CORE schema to measure common metrics without having to define them. You can also define your own custom DMFs to fine-tune your data quality measurements more precisely, and these DMFs are stored in the database and schema of your choice.
Whether you use system DMFs, custom DMFs, or both, after you assign a DMF to a table or view, Snowflake records the results of scheduling the DMF in a dedicated event table for data metric functions. You can specify the frequency for how often the DMF is called. For example, you can schedule the DMFs on a particular table to run three times daily. You can modify the frequency as needed based on your own internal data quality requirements. All DMFs that are set on the table follow the same schedule.
After you schedule the DMFs to run, you can configure alerts to notify you when changes to data quality occur. By combining the DMF and alert functionality, you can have consistent threshold notifications for data quality on the tables that you measure. These insights enhance your data governance posture by enabling the following:
Data stewards to know the current state of their data based on a particular metric.
Data engineers to take immediate action on important tables and views.
Platform administrators to ensure data quality monitoring is done with cost, consistency, and performance.
The data quality workflow of defining, measuring, and monitoring data can then be applied to additional workloads.
Supported table kinds¶
You can set a DMF on the following kinds of table objects:
Dynamic table
Event table
External table
Apache Iceberg™ table
Materialized view
Table (CREATE TABLE), including temporary and transient tables
View
You cannot set a DMF on a hybrid table or a stream object.
Billing and pricing¶
Snowflake creates and manages the virtual warehouse objects to support this feature. You can use this feature without having to provision or use any additional virtual warehouse resources outside of your normal SQL query workflow.
Calling a DMF uses serverless compute resources as described in more detail in the Snowflake Service Consumption Table. The table lists the credit pricing per cloud region and Snowflake Edition.
The credits that you use are listed in the “Data Quality Monitoring” category on your monthly bill. These credits include compute consumed by all system or user-defined data quality metrics that you use. You are not billed for creating a DMF.
Billing occurs only when a scheduled DMF is computed on an object. You are not billed for unscheduled data metric function usage, such as calling a DMF with a SELECT statement.
The logging infrastructure consolidates metric outputs in the event table. Consumption incurred by the logging service shows up on your monthly bill as “Logging.”
For more information, see Serverless credit usage.
Tip
You can query the DATA_QUALITY_MONITORING_USAGE_HISTORY to track your credit consumption related to using DMFs in your account.
Benefits¶
Using DMFs enhances your data quality efforts and provides these benefits:
- Facilitate compliance:
By knowing the state of your data, it’s easier to demonstrate how you adhere to compliance and regulatory standards. This decreases your risk and enhances your data governance posture.
- Meet service-level agreements (SLAs):
Accurate data metrics, such as freshness, help to meet SLAs between data providers, data consumers, and clients.
- Credibility:
DMFs provide data validation, which facilitates trustworthy data-driven decision making.
- Consistency:
Using system DMFs and repeatedly using proper custom DMFs leads to consistent data quality evaluation over time. This enhances the credibility of your data.
- Optimize for specific use cases:
Custom DMFs enable data engineers to design accurate metrics to measure the data, which leads to more accurate optimizations for targeted applications of the data.
- Automated measurement:
Calling the DMF is automated after you assign the DMF to a table or view, and specify the schedule for the DMF to run. No additional work is needed to actively measure your data quality. For more information, see Schedule the DMF to run and View results of a data metric function
- Performance management:
Snowflake evaluates how you use DMFs and optimizes the query evaluation to provide optimal query performance.
Considerations¶
For information about replication and DMFs, see Replication of data metric functions (DMFs).
To set a DMF on a table, the table owner role must be granted the global (account-level) EXECUTE DATA METRIC FUNCTION privilege. Therefore, the table owner role should be a user-defined custom role that you created with the CREATE ROLE command or a system role, such as the SYSADMIN role.
You cannot grant global privileges to database roles because database roles are scoped to the database in which they exist. If you have a table that is owned by a database role and you want to set a DMF on that table, you must transfer the OWNERSHIP privilege of the table to a custom role or system role. For more information, see GRANT OWNERSHIP.
Limitations¶
Note the following limitations when using DMFs:
Setting DMFs on objects:
You can only have 10,000 total associations of DMFs on objects per account. Each instance of setting a DMF on a table or view counts as one association.
-
You cannot grant privileges on a DMF to share or set a DMF on a shared table or view.
Setting a DMF on an object tag is not supported.
You cannot set a DMF on objects in a reader account.
Trial accounts do not support this feature.