Use data metric functions to perform data quality checks¶
This topic describes how to associate a data metric function (DMF) with a table or view so it performs a data quality check at regular intervals. It also describes how to call a DMF directly, for example, if you want to test a DMF before associating it with a table or view.
Associate a DMF to automate data quality checks¶
You can associate a DMF with a table or view to automatically call it on regular intervals. When associating the DMF, you specify which columns are passed to the DMF as arguments.
The process of associating a DMF with a table or view to automate data quality checks is a two-step process:
Set a schedule on the table or view that determines how often DMFs execute. You must set the schedule before associating a DMF with the table or view.
Associate the DMF with the table or view.
Schedule the DMF to run¶
You can schedule DMFs to run to automate the data quality measurements on a table.
Snowflake uses the table owner role, which is the role with the OWNERSHIP privilege on the table, to call the scheduled DMF. The table owner role must have these grants:
The USAGE privilege on the user-defined DMF, and the USAGE privilege on the database and schema that store the user-defined DMF.
The DATA_METRIC_USER database role if the DMF is a system DMF.
The EXECUTE DATA METRIC FUNCTION privilege on the account.
The DATA_METRIC_SCHEDULE object parameter for a table, view, or materialized view enables you to schedule when your DMFs can run. All data metric functions on the table or view follow the same schedule.
There are three approaches to schedule your DMF to run:
Set the DMF to run after a specified number of minutes.
Use a cron expression to schedule the DMF to run at a particular frequency.
Use a trigger event to schedule the DMF to run when there is a DML change to the table, such as inserting a new row into the table. However:
The reclustering of tables doesn’t trigger a DMF to run.
The trigger approach is only available for certain kinds of tables. For more information, see ALTER TABLE … SET DATA_METRIC_SCHEDULE.
For example:
Set the data metric function schedule to run every 5 minutes:
ALTER TABLE hr.tables.empl_info SET DATA_METRIC_SCHEDULE = '5 MINUTE';
Set the data metric function schedule to run at 8:00 AM daily:
ALTER TABLE hr.tables.empl_info SET DATA_METRIC_SCHEDULE = 'USING CRON 0 8 * * * UTC';
Set the data metric function schedule to run at 8:00 AM on weekdays only:
ALTER TABLE hr.tables.empl_info SET DATA_METRIC_SCHEDULE = 'USING CRON 0 8 * * MON,TUE,WED,THU,FRI UTC';
Set the data metric function schedule to run three times daily at 0600, 1200, and 1800 UTC:
ALTER TABLE hr.tables.empl_info SET DATA_METRIC_SCHEDULE = 'USING CRON 0 6,12,18 * * * UTC';
Set the data metric function to run when a general DML operation, such as inserting a new row, modifies the table:
ALTER TABLE hr.tables.empl_info SET DATA_METRIC_SCHEDULE = 'TRIGGER_ON_CHANGES';
You can use the SHOW PARAMETERS command to view the DMF schedule for a supported table object:
SHOW PARAMETERS LIKE 'DATA_METRIC_SCHEDULE' IN TABLE hr.tables.empl_info;+----------------------+--------------------------------+---------+-------+------------------------------------------------------------------------------------------------------------------------------+--------+ | key | value | default | level | description | type | +----------------------+--------------------------------+---------+-------+------------------------------------------------------------------------------------------------------------------------------+--------+ | DATA_METRIC_SCHEDULE | USING CRON 0 6,12,18 * * * UTC | | TABLE | Specify the schedule that data metric functions associated to the table must be executed in order to be used for evaluation. | STRING | +----------------------+--------------------------------+---------+-------+------------------------------------------------------------------------------------------------------------------------------+--------+
For view and materialized view objects, specify TABLE
as the object domain and check the schedule as follows:
SHOW PARAMETERS LIKE 'DATA_METRIC_SCHEDULE' IN TABLE mydb.public.my_view;
Note
There is a 10-minute lag from when you modify the DMF from a table for any scheduling changes to take effect on previous DMFs that are assigned to the table. However, new DMF assignments to the table are not subject to the 10 minute delay. Plan the DMF scheduling and DMF unsetting operations carefully to align with your expected DMF costs.
Additionally, when you evaluate the DMF results, such as by querying the
DATA_QUALITY_MONITORING_RESULTS view, specify the measurement_time
column in your query as the basis for the evaluation. There is an internal process that initiates the DMF evaluation, and it is possible
that table updates, such as INSERT operations, can occur between the scheduled time and the measurement time. When you use the
measurement_time
column, you have a more accurate assessment of the DMF results because the measurement time indicates the
evaluation time of the DMF.
Associate the DMF with a table¶
You use an ALTER TABLE or ALTER VIEW command to associate a DMF and specify which
columns are passed as arguments. For example, the following command associates the system DMF NULL_COUNT
with table t
. When the
DMF executes, it will return the number of NULL values in the column c1
.
ALTER TABLE t
ADD DATA METRIC FUNCTION SNOWFLAKE.CORE.NULL_COUNT
ON (c1);
Note
Use the ALTER VIEW command to add a DMF on a materialized view column or drop a DMF from a materialized view column.
Drop a DMF from an object¶
You can drop a DMF using an ALTER TABLE or ALTER VIEW command. For example:
ALTER TABLE t DROP DATA METRIC FUNCTION governance.dmfs.count_positive_numbers ON (c1, c2, c3);
Call a DMF manually¶
Calling a DMF directly can be useful to test the output of the DMF before associating it with a table or view.
Use the following syntax to call a DMF:
SELECT <data_metric_function>(<query>)
Where:
data_metric_function
Specifies a system- or user-defined DMF.
query
Specifies a SQL query on a table or view.
The columns projected by the query must match the column arguments in the DMF signature.
Note
The following system DMFs don’t follow this syntax because they don’t take any arguments:
For example, to call a custom DMF count_positive_numbers
, which accepts three columns as arguments, execute the following:
SELECT governance.dmfs.count_positive_numbers(
SELECT c1, c2, c3
FROM t);
For example, to call the NULL_COUNT (system data metric function) system DMF to view the number of NULL values
in the ssn
column, execute the following:
SELECT SNOWFLAKE.CORE.NULL_COUNT(
SELECT ssn
FROM hr.tables.empl_info);