Use data metric functions to perform data quality checks

This topic describes how to associate a data metric function (DMF) with a table or view so it performs a data quality check at regular intervals. It also describes how to call a DMF directly, for example, if you want to test a DMF before associating it with a table or view.

Associate a DMF to automate data quality checks

You can associate a DMF with a table or view to automatically call it on regular intervals. When associating the DMF, you specify which columns are passed to the DMF as arguments.

The process of associating a DMF with a table or view to automate data quality checks is a two-step process:

  1. Set a schedule on the table or view that determines how often DMFs execute. You must set the schedule before associating a DMF with the table or view.

  2. Associate the DMF with the table or view.

Schedule the DMF to run

You can schedule DMFs to run to automate the data quality measurements on a table.

Snowflake uses the table owner role, which is the role with the OWNERSHIP privilege on the table, to call the scheduled DMF. The table owner role must have these grants:

  • The USAGE privilege on the user-defined DMF, and the USAGE privilege on the database and schema that store the user-defined DMF.

  • The DATA_METRIC_USER database role if the DMF is a system DMF.

  • The EXECUTE DATA METRIC FUNCTION privilege on the account.

The DATA_METRIC_SCHEDULE object parameter for a table, view, or materialized view enables you to schedule when your DMFs can run. All data metric functions on the table or view follow the same schedule.

There are three approaches to schedule your DMF to run:

  • Set the DMF to run after a specified number of minutes.

  • Use a cron expression to schedule the DMF to run at a particular frequency.

  • Use a trigger event to schedule the DMF to run when there is a DML change to the table, such as inserting a new row into the table. However:

For example:

Set the data metric function schedule to run every 5 minutes:

ALTER TABLE hr.tables.empl_info SET
  DATA_METRIC_SCHEDULE = '5 MINUTE';
Copy

Set the data metric function schedule to run at 8:00 AM daily:

ALTER TABLE hr.tables.empl_info SET
  DATA_METRIC_SCHEDULE = 'USING CRON 0 8 * * * UTC';
Copy

Set the data metric function schedule to run at 8:00 AM on weekdays only:

ALTER TABLE hr.tables.empl_info SET
  DATA_METRIC_SCHEDULE = 'USING CRON 0 8 * * MON,TUE,WED,THU,FRI UTC';
Copy

Set the data metric function schedule to run three times daily at 0600, 1200, and 1800 UTC:

ALTER TABLE hr.tables.empl_info SET
  DATA_METRIC_SCHEDULE = 'USING CRON 0 6,12,18 * * * UTC';
Copy

Set the data metric function to run when a general DML operation, such as inserting a new row, modifies the table:

ALTER TABLE hr.tables.empl_info SET
  DATA_METRIC_SCHEDULE = 'TRIGGER_ON_CHANGES';
Copy

You can use the SHOW PARAMETERS command to view the DMF schedule for a supported table object:

SHOW PARAMETERS LIKE 'DATA_METRIC_SCHEDULE' IN TABLE hr.tables.empl_info;
Copy
+----------------------+--------------------------------+---------+-------+------------------------------------------------------------------------------------------------------------------------------+--------+
| key                  | value                          | default | level | description                                                                                                                  | type   |
+----------------------+--------------------------------+---------+-------+------------------------------------------------------------------------------------------------------------------------------+--------+
| DATA_METRIC_SCHEDULE | USING CRON 0 6,12,18 * * * UTC |         | TABLE | Specify the schedule that data metric functions associated to the table must be executed in order to be used for evaluation. | STRING |
+----------------------+--------------------------------+---------+-------+------------------------------------------------------------------------------------------------------------------------------+--------+

For view and materialized view objects, specify TABLE as the object domain and check the schedule as follows:

SHOW PARAMETERS LIKE 'DATA_METRIC_SCHEDULE' IN TABLE mydb.public.my_view;
Copy

Note

There is a 10-minute lag from when you modify the DMF from a table for any scheduling changes to take effect on previous DMFs that are assigned to the table. However, new DMF assignments to the table are not subject to the 10 minute delay. Plan the DMF scheduling and DMF unsetting operations carefully to align with your expected DMF costs.

Additionally, when you evaluate the DMF results, such as by querying the DATA_QUALITY_MONITORING_RESULTS view, specify the measurement_time column in your query as the basis for the evaluation. There is an internal process that initiates the DMF evaluation, and it is possible that table updates, such as INSERT operations, can occur between the scheduled time and the measurement time. When you use the measurement_time column, you have a more accurate assessment of the DMF results because the measurement time indicates the evaluation time of the DMF.

Associate the DMF with a table

You use an ALTER TABLE or ALTER VIEW command to associate a DMF and specify which columns are passed as arguments. For example, the following command associates the system DMF NULL_COUNT with table t. When the DMF executes, it will return the number of NULL values in the column c1.

ALTER TABLE t
  ADD DATA METRIC FUNCTION SNOWFLAKE.CORE.NULL_COUNT
    ON (c1);
Copy

Note

Use the ALTER VIEW command to add a DMF on a materialized view column or drop a DMF from a materialized view column.

Drop a DMF from an object

You can drop a DMF using an ALTER TABLE or ALTER VIEW command. For example:

ALTER TABLE t
  DROP DATA METRIC FUNCTION governance.dmfs.count_positive_numbers
    ON (c1, c2, c3);
Copy

Call a DMF manually

Calling a DMF directly can be useful to test the output of the DMF before associating it with a table or view.

Use the following syntax to call a DMF:

SELECT <data_metric_function>(<query>)
Copy

Where:

data_metric_function

Specifies a system- or user-defined DMF.

query

Specifies a SQL query on a table or view.

The columns projected by the query must match the column arguments in the DMF signature.

Note

The following system DMFs don’t follow this syntax because they don’t take any arguments:

For example, to call a custom DMF count_positive_numbers, which accepts three columns as arguments, execute the following:

SELECT governance.dmfs.count_positive_numbers(
  SELECT c1, c2, c3
  FROM t);
Copy

For example, to call the NULL_COUNT (system data metric function) system DMF to view the number of NULL values in the ssn column, execute the following:

SELECT SNOWFLAKE.CORE.NULL_COUNT(
  SELECT ssn
  FROM hr.tables.empl_info);
Copy