Introduction to Data Quality and data metric functions

Data Quality uses data metric functions (DMFs), which include Snowflake-provided system DMFs and user-defined DMFs, to monitor the state and integrity of your data. You can use DMFs to measure key metrics, such as, but not limited to, freshness and counts that measure duplicates, NULLs, rows, and unique values.

About Data Quality and DMFs

Data Quality focuses on knowing the state and integrity of your data, which includes data freshness and accuracy with respect to true data values compared to null values or blank fields in a column, to make data-driven decisions. You can measure the quality of your data by using DMFs. Snowflake provides built-in system DMFs in the SNOWFLAKE.CORE schema to measure common metrics without having to define them. You can also define your own custom DMFs to fine-tune your data quality measurements more precisely, and these DMFs are stored in the database and schema of your choice.

Whether you use system DMFs, custom DMFs, or both, after you assign a DMF to a table or view, Snowflake records the results of calling the DMF in the event table for your account. You can specify the schedule for how often the DMF is called. For example, you can schedule the DMFs on a particular table to run three times daily, and you can modify the schedule as needed based on your own internal data quality requirements. All DMFs that are set on the table follow the same schedule.

After you schedule the DMFs to run, you can configure alerts to notify you when changes to data quality occur. By combining the DMF and alert functionality, you can have consistent threshold notifications for data quality on the tables that you measure. These insights enhance your data governance posture by enabling the following:

  • Data stewards to know the current state of their data based on a particular metric.

  • Data engineers to take immediate action on important tables and views.

  • Platform administrators to ensure data quality monitoring is done with cost, consistency, and performance.

The data quality workflow of defining, measuring, and monitoring data can then be applied to additional workloads.

For information, see:

Billing and pricing

Snowflake creates and manages the virtual warehouse objects to support this feature. You can use this feature without having to provision or use any additional virtual warehouse resources outside of your normal SQL query workflow.

Calling a DMF uses serverless compute resources as described in more detail in the Snowflake Consumption Table. The table lists the credit pricing per cloud region and Snowflake Edition.

The credits that you use are listed in the “Data Quality Monitoring” category on your monthly bill. These credits include compute consumed by all system or user-defined data quality metrics that you use. You are not billed for creating a DMF.

  • Billing occurs only when a scheduled DMF is computed on an object. You are not billed for unscheduled data metric function usage, such as calling a DMF with a SELECT statement.

  • The logging infrastructure consolidates metric outputs in the event table. Consumption incurred by the logging service shows up on your monthly bill as “Logging.”

For example, your Snowflake credits per compute hour are shown as follows:

Feature

Snowflake-managed Compute

Cloud Services

Data Quality Monitoring

2

1

Logging

1.25

1

For more information, see Serverless credit usage.

Tip

You can query the DATA_QUALITY_MONITORING_USAGE_HISTORY to track your credit consumption related to using DMFs in your account.

Benefits

Using DMFs enhances your data quality efforts and provides these benefits:

Facilitate compliance:

By knowing the state of your data, it’s easier to demonstrate how you adhere to compliance and regulatory standards. This decreases your risk and enhances your data governance posture.

Meet service-level agreements (SLAs):

Accurate data metrics, such as freshness, help to meet SLAs between data providers, data consumers, and clients.

Credibility:

DMFs provide data validation, which facilitates trustworthy data-driven decision making.

Consistency:

Using system DMFs and repeatedly using proper custom DMFs leads to consistent data quality evaluation over time. This enhances the credibility of your data.

Optimize for specific use cases:

Custom DMFs enable data engineers to design accurate metrics to measure the data, which leads to more accurate optimizations for targeted applications of the data.

Automated measurement:

Calling the DMF is automated after you assign the DMF to a table or view, specify the schedule for the DMF to run, and set an event table as the active event table for the account to record the results of calling the DMF. No additional work is needed to actively measure your data quality.

Performance management:

Snowflake evaluates how you use DMFs and optimizes the query evaluation to provide optimal query performance.

Considerations

For information about replication and DMFs, see Replication of data metric functions (DMFs).

Limitations

Note the following limitations when using DMFs:

  • Only regular tables are supported. You cannot set a DMF on any other kind of table, such as a dynamic table or external table.

  • Setting DMFs on objects:

    You can only have 1000 total associations of DMFs on objects per account. Each instance of setting a DMF on a table or view counts as one association.

  • Data sharing:

    You cannot grant privileges on a DMF to share or set a DMF on a shared table or view.

  • Setting a DMF on an object tag is not supported.

  • Using the CLONE operation and the CREATE TABLE … LIKE operation does not result in DMF assignments on the target object.

  • You cannot set a DMF on objects in a reader account.

  • You cannot create a dynamic table or a stream based on the data in the event table.

  • Trial accounts do not support this feature.

Managing DMFs

These sections summarize information on how to manage DMFs:

DMF privileges

Snowflake supports the following privileges to use with a DMF.

Note that operating on any object in a schema also requires the USAGE privilege on the parent database and schema.

Privilege

Object

Usage

CREATE FUNCTION

Schema

Enables creating a new DMF in a schema.

USAGE

Data metric function

Enables calling the DMF.

EXECUTE DATA METRIC FUNCTION

Account

Enables using serverless compute resources when calling a DMF.

OWNERSHIP

Data metric function

Transfers ownership of the DMF, which grants full control over the DMF. Required to alter most properties of the DMF.

Additionally, if you’d like to manage the assignment of a DMF on a table or view, use a role with these privileges:

Privilege

Object

Notes

OWNERSHIP

Table or view

OWNERSHIP is a special privilege on an object that is automatically granted to the role that created the object, but can also be transferred using the GRANT OWNERSHIP command to a different role by the owning role (or any role with the MANAGE GRANTS privilege).

USAGE

Data metric function

This privilege is for user-defined DMFs.

For system DMFs, grant the SNOWFLAKE.DATA_METRIC_USER database role to the table owner role.

USAGE

Database, schema

These objects are the database and schema that store the user-defined DMF.

EXECUTE DATA METRIC FUNCTION

Account

This privilege enables you to control which roles have access to serverless compute resources.

DMF command reference

Snowflake supports the following commands to work with DMFs:

Additionally, use the ALTER TABLE and the ALTER VIEW commands to do the following:

  • Add or drop a data metric function on a column.

  • Add or drop a data metric function on the table or view itself.

  • Schedule the data metric function to run.

For representative examples, see Working with data metric functions.

When using the GRANT <privileges> and REVOKE <privileges> commands, you can:

  • Grant privileges to an account role or a database role. Other role types, such as application roles, are not supported.

  • Specify any supported DMF privilege or ALL PRIVILEGES.

You can use the GRANT OWNERSHIP command to transfer ownership to an account role.

The GRANT and REVOKE commands require you to specify the arguments of the user-defined DMF that you create. For example:

GRANT USAGE ON FUNCTION
  governance.dmfs.count_positive_numbers(TABLE(NUMBER, NUMBER, NUMBER))
  TO data_engineer;
Copy

Summary of DMF commands, operations, and privileges

The following table summarizes the relationship between DMF privileges and DDL operations.

Note that operating on any object in a schema also requires the USAGE privilege on the parent database and schema.

In addition to the requirements in the table, if you’re using a system DMF, use the ACCOUNTADMIN role or a role that is granted IMPORTED PRIVILEGES on the shared SNOWFLAKE database.

Operation

Privilege required

Use a system DMF.

See System data metric functions.

Create a DMF with the CREATE FUNCTION command.

A role with the CREATE FUNCTION privilege on the schema.

Modify a DMF with an ALTER FUNCTION command.

A role with the OWNERSHIP privilege on the DMF.

Add a DMF or drop a DMF from a table or view with an ALTER TABLE or ALTER VIEW command.

A role with the following privileges:

  • OWNERSHIP on the table

  • USAGE on the DMF

  • EXECUTE DATA METRIC FUNCTION on the account

Identify metric references by calling the DATA_METRIC_FUNCTION_REFERENCES function.

Both of the following:

  • A role with the USAGE privilege on the DMF.

  • A role with the SELECT privilege on the table or view on which the DMF is added.

Describe a DMF with a DESCRIBE FUNCTION command.

A role with the USAGE privilege on the DMF.

Drop a DMF from the current or specified schema with a DROP FUNCTION command.

A role with the OWNERSHIP privilege on the DMF.

List data metric functions with a SHOW DATA METRIC FUNCTIONS or SHOW FUNCTIONS command.

A role with the USAGE privilege on the DMF.

Call a DMF manually using SELECT.

A role with the global EXECUTE DATA METRIC FUNCTION privilege or a role with the USAGE privilege on the DMF.