Working with data metric functions¶
Before trying each of the examples in this topic, be sure your role in use meets the necessary access control requirements.
Measure now: system DMF example¶
Use the following syntax to call a DMF:
SELECT <data_metric_function>(<query>)
Where:
data_metric_function
Specifies a system- or user-defined DMF.
query
Specifies a SQL query on a table or view.
The columns in the query must match the column arguments in the DMF signature.
Note
These system DMFs don’t follow this syntax because they don’t take any arguments:
Call a system DMF to start measuring your data quality immediately.
For example, call the NULL_COUNT (system data metric function) system DMF to view the number of NULL values in the SSN column (US Social Security number):
USE ROLE data_engineer;
SELECT SNOWFLAKE.CORE.NULL_COUNT(
SELECT ssn
FROM hr.tables.empl_info
);
You can query the event table to view the results. For information, see View the DMF results.
Create your own DMF¶
Use the CREATE DATA METRIC FUNCTION command to create your own DMFs.
Create a DMF that calls the COUNT function to return the total number of rows that don’t contain NULL values when evaluating the first three columns in the table:
CREATE OR REPLACE DATA METRIC FUNCTION governance.dmfs.count_positive_numbers(
arg_t TABLE(
arg_c1 NUMBER,
arg_c2 NUMBER,
arg_c3 NUMBER
)
)
RETURNS NUMBER
AS
$$
SELECT
COUNT(*)
FROM arg_t
WHERE
arg_c1>0
AND arg_c2>0
AND arg_c3>0
$$;
For an example that references a table other than the table to which you assign the DMF, see Example: Foreign key reference.
Note
If you need to view the DMF definition, use the DESCRIBE FUNCTION (DMF) command.
View the properties of a DMF¶
Describe the DMF to view its properties:
DESC FUNCTION governance.dmfs.count_positive_numbers(
TABLE(
NUMBER, NUMBER, NUMBER
)
);
+-----------+---------------------------------------------------------------------+
| property | value |
+-----------+---------------------------------------------------------------------+
| signature | (ARG_T TABLE(ARG_C1 NUMBER, ARG_C2 NUMBER, ARG_C3 NUMBER)) |
| returns | NUMBER(38,0) |
| language | SQL |
| body | SELECT COUNT(*) FROM arg_t WHERE arg_c1>0 AND arg_c2>0 AND arg_c3>0 |
+-----------+---------------------------------------------------------------------+
Secure the DMF¶
Make the DMF a secure DMF:
ALTER FUNCTION governance.dmfs.count_positive_numbers(
TABLE(
NUMBER,
NUMBER,
NUMBER
))
SET SECURE;
List your DMFs¶
Use the SHOW DATA METRIC FUNCTIONS or SHOW FUNCTIONS command to list DMFs in your account, database, or schema:
SHOW DATA METRIC FUNCTIONS IN ACCOUNT;
Alternatively, you can query Information Schema FUNCTIONS view or the Account Usage FUNCTIONS view to list your DMFs in the specified database or your account, respectively.
The is_data_metric
column specifies whether the function is a DMF.
Set a tag on a DMF¶
Use the ALTER FUNCTION command to set a tag on a DMF:
ALTER FUNCTION governance.dmfs.count_positive_numbers(
TABLE(NUMBER, NUMBER, NUMBER))
SET TAG governance.tags.quality = 'counts';
Schedule your DMFs to run¶
You can schedule your DMFs to run to automate the data quality measurements on a table.
Snowflake uses the table owner role, which is the role with the OWNERSHIP privilege on the table, to call the scheduled DMF. The table owner role must have these grants:
The USAGE privilege on the user-defined DMF, and the USAGE privilege on the database and schema that store the user-defined DMF.
The DATA_METRIC_USER database role if the DMF is a system DMF.
The EXECUTE DATA METRIC FUNCTION privilege on the account.
The DATA_METRIC_SCHEDULE object parameter for a table, view, or materialized view enables you to schedule when your DMFs can run. All data metric functions on the table or view follow the same schedule.
There are three approaches to schedule your DMF to run:
Set the DMF to run after a specified number of minutes.
Use a cron expression to schedule the DMF to run at a particular frequency.
Use a trigger event to schedule the DMF to run when there is a DML change to the table, such as inserting a new row into the table. However:
The reclustering of tables doesn’t trigger a DMF to run.
The trigger approach is only available for certain kinds of tables. For more information, see ALTER TABLE … SET DATA_METRIC_SCHEDULE.
For example:
Set the data metric function schedule to run every 5 minutes:
ALTER TABLE hr.tables.empl_info SET DATA_METRIC_SCHEDULE = '5 MINUTE';
Set the data metric function schedule to run at 8:00 AM daily:
ALTER TABLE hr.tables.empl_info SET DATA_METRIC_SCHEDULE = 'USING CRON 0 8 * * * UTC';
Set the data metric function schedule to run at 8:00 AM on weekdays only:
ALTER TABLE hr.tables.empl_info SET DATA_METRIC_SCHEDULE = 'USING CRON 0 8 * * MON,TUE,WED,THU,FRI UTC';
Set the data metric function schedule to run three times daily at 0600, 1200, and 1800 UTC:
ALTER TABLE hr.tables.empl_info SET DATA_METRIC_SCHEDULE = 'USING CRON 0 6,12,18 * * * UTC';
Set the data metric function to run when a general DML operation, such as inserting a new row, modifies the table:
ALTER TABLE hr.tables.empl_info SET DATA_METRIC_SCHEDULE = 'TRIGGER_ON_CHANGES';
You can use the SHOW PARAMETERS command to view the DMF schedule for a supported table object:
SHOW PARAMETERS LIKE 'DATA_METRIC_SCHEDULE' IN TABLE hr.tables.empl_info;+----------------------+--------------------------------+---------+-------+------------------------------------------------------------------------------------------------------------------------------+--------+ | key | value | default | level | description | type | +----------------------+--------------------------------+---------+-------+------------------------------------------------------------------------------------------------------------------------------+--------+ | DATA_METRIC_SCHEDULE | USING CRON 0 6,12,18 * * * UTC | | TABLE | Specify the schedule that data metric functions associated to the table must be executed in order to be used for evaluation. | STRING | +----------------------+--------------------------------+---------+-------+------------------------------------------------------------------------------------------------------------------------------+--------+
For view and materialized view objects, specify TABLE
as the object domain and check the schedule as follows:
SHOW PARAMETERS LIKE 'DATA_METRIC_SCHEDULE' IN TABLE mydb.public.my_view;
Note
There is a 10-minute lag from when you modify the DMF from a table for any scheduling changes to take effect on previous DMFs that are assigned to the table. However, new DMF assignments to the table are not subject to the 10 minute delay. Plan the DMF scheduling and DMF unsetting operations carefully to align with your expected DMF costs.
Additionally, when you evaluate the DMF results, such as by querying the
DATA_QUALITY_MONITORING_RESULTS view, specify the measurement_time
column in your query as the basis for the evaluation. There is an internal process that initiates the DMF evaluation, and it is possible
that table updates, such as INSERT operations, can occur between the scheduled time and the measurement time. When you use the
measurement_time
column, you have a more accurate assessment of the DMF results because the measurement time indicates the
evaluation time of the DMF.
Add or drop a DMF on a table or view¶
Use an ALTER TABLE or ALTER VIEW command to do the following:
Add a DMF to a table using an ALTER TABLE.
ALTER TABLE t ADD DATA METRIC FUNCTION governance.dmfs.count_positive_numbers ON (c1, c2, c3);
Drop a DMF from a table using an ALTER TABLE command.
ALTER TABLE t DROP DATA METRIC FUNCTION governance.dmfs.count_positive_numbers ON (c1, c2, c3);
Note
Use the ALTER VIEW command to add a DMF on a materialized view column or drop a DMF from a materialized view column.
Track DMF references¶
You can query the DATA_METRIC_FUNCTION_REFERENCES Account Usage view to view DMF references in your account, or call the DATA_METRIC_FUNCTION_REFERENCES Information Schema table function to identify the DMFs associated with a given table or view or the tables or views associated with a given DMF.
For example:
Return a row for each object (table or view) that has the DMF named count_positive_numbers
set on that table or
view:
USE DATABASE governance; USE SCHEMA INFORMATION_SCHEMA; SELECT * FROM TABLE( INFORMATION_SCHEMA.DATA_METRIC_FUNCTION_REFERENCES( METRIC_NAME => 'governance.dmfs.count_positive_numbers' ) );
Return a row for each DMF assigned to the table named hr.tables.empl_info
:
USE DATABASE governance; USE SCHEMA INFORMATION_SCHEMA; SELECT * FROM TABLE( INFORMATION_SCHEMA.DATA_METRIC_FUNCTION_REFERENCES( REF_ENTITY_NAME => 'hr.tables.empl_info', REF_ENTITY_DOMAIN => 'table' ) );
Call your DMF manually¶
Use the following syntax to call a DMF:
SELECT <data_metric_function>(<query>)
Where:
data_metric_function
Specifies a system- or user-defined DMF.
query
Specifies a SQL query on a table or view.
The columns in the query must match the column arguments in the DMF signature.
Note
These system DMFs don’t follow this syntax because they don’t take any arguments:
Call the DMF directly and view the results:
SELECT governance.dmfs.count_positive_numbers(
SELECT c1,
c2,
c3
FROM t
);
When you set a DMF on a table or view and schedule it to run, you can query the view built from the event table to view the results. For information, see View the DMF results.
View the DMF results¶
To view DMF results, you first need to manage access to the results and then choose how to view the results:
Managing access to the DMF results¶
You have three options to access the results of a scheduled DMF, which are detailed in the next section:
Option 1: Query the dedicated event table.
Option 2: Query the DATA_QUALITY_MONITORING_RESULTS view, which is a flattened version of the event table.
Option 3: Call the DATA_QUALITY_MONITORING_RESULTS table function.
Depending on how you want to manage access to the DMF results, grant the following system application roles to another account-level role:
Application role |
Notes |
---|---|
DATA_QUALITY_MONITORING_ADMIN |
Can use options 1, 2, and 3. A role that is granted this application role can grant the DATA_QUALITY_MONITORING_VIEWER and DATA_QUALITY_MONITORING_LOOKUP application roles to other roles. A user that is granted the ACCOUNTADMIN role can grant the DATA_QUALITY_MONITORING_ADMIN application role to other roles. |
DATA_QUALITY_MONITORING_VIEWER |
Options 2 and 3. |
DATA_QUALITY_MONITORING_LOOKUP |
Option 3 only. |
For example, use the GRANT APPLICATION ROLE command to grant the DATA_QUALITY_MONITORING_VIEWER system
application role to the analyst
custom role:
USE ROLE ACCOUNTADMIN; GRANT APPLICATION ROLE SNOWFLAKE.DATA_QUALITY_MONITORING_VIEWER TO ROLE analyst;
Use the REVOKE APPLICATION ROLE command to revoke the system application role from an account role.
Tip
Additionally, if you want to manage access to the results of system DMFs, grant the SNOWFLAKE.DATA_METRIC_USER database role to the account role that can access the results. For more information, see GRANT DATABASE ROLE.
Options¶
To view the scheduled DMF results, choose one of the following options:
- Option 1: Query the dedicated event table
This option gives you access to the raw data, and you have more freedom to post-process the data using derived objects, such as creating views, table functions, or stored procedures based on how you want to analyze the results. Additionally, if you create these derived objects, you can selectively grant access on these objects to different roles. For example, a data engineer can access the stored procedures to maintain the approach to obtain the results, and a data analyst can access the view to analyze the results.
The event table is named
SNOWFLAKE.LOCAL.DATA_QUALITY_MONITORING_RESULTS_RAW
.For information about the event table columns, see Event table columns.
For a representative example to query the event table, see the logging and tracing tutorial.
- Option 2: Query the DATA_QUALITY_MONITORING_RESULTS view
This option enables you to query the DATA_QUALITY_MONITORING_RESULTS view, which flattens the raw data in the event table to enable easier access to the DMF results. Additionally, this option is best when data post-processing is not needed and when you don’t want to grant access to the raw data.
The view exists in the LOCAL schema in the shared SNOWFLAKE database:
SNOWFLAKE.LOCAL.DATA_QUALITY_MONITORING_RESULTS
.For information, see the DATA_QUALITY_MONITORING_RESULTS view.
Note
The SNOWFLAKE.GOVERNANCE_VIEWER database role does not have access to query the DATA_QUALITY_MONITORING_RESULTS view.
- Option 3: Call the DATA_QUALITY_MONITORING_RESULTS table function
This option enables you to call the DATA_QUALITY_MONITORING_RESULTS table function to view the DMF results. The function returns the same columns as the DATA_QUALITY_MONITORING_RESULTS view. However, you can only specify a single table when calling the function. This option is best when you want to limit data metric function results to a single table and not provide access to the measurements of other tables or the event table.
In addition to the SNOWFLAKE.DATA_QUALITY_MONITORING_LOOKUP application role, the role used to call the DATA_QUALITY_MONITORING_RESULTS table function also needs the following privileges:
OWNERSHIP or SELECT on the table.
OWNERSHIP or USAGE on the DMF.
Drop the DMF from the system¶
Drop the DMF from the system:
DROP FUNCTION governance.dmfs.count_positive_numbers(
TABLE(
NUMBER, NUMBER, NUMBER
)
);
Limitations¶
For information, see the section on data quality limitations.