Working with data metric functions¶
Before trying each of the examples in this topic, be sure your role in use meets the necessary access control requirements.
Create your own DMF¶
If there isn’t a system data quality metric function (DMF) that can perform your data quality checks, then you can use the CREATE DATA METRIC FUNCTION command to create your own DMF.
- Example: User-defined DMF with single table argument
Create a DMF that calls the COUNT function to return the total number of rows that have positive numbers in the first three columns of the table:
CREATE OR REPLACE DATA METRIC FUNCTION governance.dmfs.count_positive_numbers( arg_t TABLE( arg_c1 NUMBER, arg_c2 NUMBER, arg_c3 NUMBER ) ) RETURNS NUMBER AS $$ SELECT COUNT(*) FROM arg_t WHERE arg_c1>0 AND arg_c2>0 AND arg_c3>0 $$;
- Example: Using multiple table arguments to perform referential checks
A user-defined DMF can have more than one argument that accepts a table. When you add the DMF to a table, that table is used as the first argument. If there is an additional argument that accepts a table, you must also specify the fully qualified name of the second table. This capability simplifies referential integrity, matching and comparison, or conditional checking across different datasets.
Suppose you want to validate referential integrity as defined by a primary key/foreign key relationship. In this case, you can create a DMF to validate that all records in a source table have corresponding records in the referenced table. The following user-defined DMF returns the number of records where the value of a column in one table does not have a corresponding value in the column of another table:
CREATE OR REPLACE DATA METRIC FUNCTION governance.dmfs.referential_check( arg_t1 TABLE (arg_c1 INT), arg_t2 TABLE (arg_c2 INT)) RETURNS NUMBER AS 'SELECT COUNT(*) FROM arg_t1 WHERE arg_c1 NOT IN (SELECT arg_c2 FROM arg_t2)';
Now suppose you want to check whether every order, as identified by its
sp_id
, in thesalesorders
table maps back to ansp_id
in thesalespeople
table. You can add the DMF to thesalesorders
table while specifying thesalespeople
table as the other table argument.ALTER TABLE salesorders ADD DATA METRIC FUNCTION governance.dmfs.referential_check ON (sp_id, TABLE (my_db.sch1.salespeople(sp_id)));
The output returns the number of rows in the
salesorders
table that have a value in thesp_id
column that doesn’t appear in thesp_id
column of thesalespeople
table. A value greater than 0 indicates that there aresp_id
values insalesorders
that don’t map to records insalespeople
.
Call your DMF manually¶
Use the following syntax to call a DMF:
SELECT <data_metric_function>(<query>)
Where:
data_metric_function
Specifies a system- or user-defined DMF.
query
Specifies a SQL query on a table or view.
The columns in the query must match the column arguments in the DMF signature.
Note
These system DMFs don’t follow this syntax because they don’t take any arguments:
Call the DMF directly and view the results:
SELECT governance.dmfs.count_positive_numbers(
SELECT c1,
c2,
c3
FROM t
);
When you set a DMF on a table or view and schedule it to run, you can query the view built from the event table to view the results. For information, see View the DMF results.
Add or drop a DMF on a table or view¶
Use an ALTER TABLE or ALTER VIEW command to do the following:
Add a DMF to a table using an ALTER TABLE.
ALTER TABLE t ADD DATA METRIC FUNCTION governance.dmfs.count_positive_numbers ON (c1, c2, c3);
Drop a DMF from a table using an ALTER TABLE command.
ALTER TABLE t DROP DATA METRIC FUNCTION governance.dmfs.count_positive_numbers ON (c1, c2, c3);
Note
Use the ALTER VIEW command to add a DMF on a materialized view column or drop a DMF from a materialized view column.
Schedule your DMFs to run¶
You can schedule your DMFs to run to automate the data quality measurements on a table.
Snowflake uses the table owner role, which is the role with the OWNERSHIP privilege on the table, to call the scheduled DMF. The table owner role must have these grants:
The USAGE privilege on the user-defined DMF, and the USAGE privilege on the database and schema that store the user-defined DMF.
The DATA_METRIC_USER database role if the DMF is a system DMF.
The EXECUTE DATA METRIC FUNCTION privilege on the account.
The DATA_METRIC_SCHEDULE object parameter for a table, view, or materialized view enables you to schedule when your DMFs can run. All data metric functions on the table or view follow the same schedule.
There are three approaches to schedule your DMF to run:
Set the DMF to run after a specified number of minutes.
Use a cron expression to schedule the DMF to run at a particular frequency.
Use a trigger event to schedule the DMF to run when there is a DML change to the table, such as inserting a new row into the table. However:
The reclustering of tables doesn’t trigger a DMF to run.
The trigger approach is only available for certain kinds of tables. For more information, see ALTER TABLE … SET DATA_METRIC_SCHEDULE.
For example:
Set the data metric function schedule to run every 5 minutes:
ALTER TABLE hr.tables.empl_info SET DATA_METRIC_SCHEDULE = '5 MINUTE';
Set the data metric function schedule to run at 8:00 AM daily:
ALTER TABLE hr.tables.empl_info SET DATA_METRIC_SCHEDULE = 'USING CRON 0 8 * * * UTC';
Set the data metric function schedule to run at 8:00 AM on weekdays only:
ALTER TABLE hr.tables.empl_info SET DATA_METRIC_SCHEDULE = 'USING CRON 0 8 * * MON,TUE,WED,THU,FRI UTC';
Set the data metric function schedule to run three times daily at 0600, 1200, and 1800 UTC:
ALTER TABLE hr.tables.empl_info SET DATA_METRIC_SCHEDULE = 'USING CRON 0 6,12,18 * * * UTC';
Set the data metric function to run when a general DML operation, such as inserting a new row, modifies the table:
ALTER TABLE hr.tables.empl_info SET DATA_METRIC_SCHEDULE = 'TRIGGER_ON_CHANGES';
You can use the SHOW PARAMETERS command to view the DMF schedule for a supported table object:
SHOW PARAMETERS LIKE 'DATA_METRIC_SCHEDULE' IN TABLE hr.tables.empl_info;+----------------------+--------------------------------+---------+-------+------------------------------------------------------------------------------------------------------------------------------+--------+ | key | value | default | level | description | type | +----------------------+--------------------------------+---------+-------+------------------------------------------------------------------------------------------------------------------------------+--------+ | DATA_METRIC_SCHEDULE | USING CRON 0 6,12,18 * * * UTC | | TABLE | Specify the schedule that data metric functions associated to the table must be executed in order to be used for evaluation. | STRING | +----------------------+--------------------------------+---------+-------+------------------------------------------------------------------------------------------------------------------------------+--------+
For view and materialized view objects, specify TABLE
as the object domain and check the schedule as follows:
SHOW PARAMETERS LIKE 'DATA_METRIC_SCHEDULE' IN TABLE mydb.public.my_view;
Note
There is a 10-minute lag from when you modify the DMF from a table for any scheduling changes to take effect on previous DMFs that are assigned to the table. However, new DMF assignments to the table are not subject to the 10 minute delay. Plan the DMF scheduling and DMF unsetting operations carefully to align with your expected DMF costs.
Additionally, when you evaluate the DMF results, such as by querying the
DATA_QUALITY_MONITORING_RESULTS view, specify the measurement_time
column in your query as the basis for the evaluation. There is an internal process that initiates the DMF evaluation, and it is possible
that table updates, such as INSERT operations, can occur between the scheduled time and the measurement time. When you use the
measurement_time
column, you have a more accurate assessment of the DMF results because the measurement time indicates the
evaluation time of the DMF.
View the DMF results¶
To view DMF results, you first need to manage access to the results and then choose how to view the results:
Managing access to the DMF results¶
You have three options to access the results of a scheduled DMF, which are detailed in the next section:
Option 1: Query the dedicated event table.
Option 2: Query the DATA_QUALITY_MONITORING_RESULTS view, which is a flattened version of the event table.
Option 3: Call the DATA_QUALITY_MONITORING_RESULTS table function.
Depending on how you want to manage access to the DMF results, grant the following system application roles to another account-level role:
Application role |
Notes |
---|---|
DATA_QUALITY_MONITORING_ADMIN |
Can use options 1, 2, and 3. A role that is granted this application role can grant the DATA_QUALITY_MONITORING_VIEWER and DATA_QUALITY_MONITORING_LOOKUP application roles to other roles. A user that is granted the ACCOUNTADMIN role can grant the DATA_QUALITY_MONITORING_ADMIN application role to other roles. |
DATA_QUALITY_MONITORING_VIEWER |
Options 2 and 3. |
DATA_QUALITY_MONITORING_LOOKUP |
Option 3 only. |
For example, use the GRANT APPLICATION ROLE command to grant the DATA_QUALITY_MONITORING_VIEWER system
application role to the analyst
custom role:
USE ROLE ACCOUNTADMIN; GRANT APPLICATION ROLE SNOWFLAKE.DATA_QUALITY_MONITORING_VIEWER TO ROLE analyst;
Use the REVOKE APPLICATION ROLE command to revoke the system application role from an account role.
Tip
Additionally, if you want to manage access to the results of system DMFs, grant the SNOWFLAKE.DATA_METRIC_USER database role to the account role that can access the results. For more information, see GRANT DATABASE ROLE.
Options¶
To view the scheduled DMF results, choose one of the following options:
- Option 1: Query the dedicated event table
This option gives you access to the raw data, and you have more freedom to post-process the data using derived objects, such as creating views, table functions, or stored procedures based on how you want to analyze the results. Additionally, if you create these derived objects, you can selectively grant access on these objects to different roles. For example, a data engineer can access the stored procedures to maintain the approach to obtain the results, and a data analyst can access the view to analyze the results.
The event table is named
SNOWFLAKE.LOCAL.DATA_QUALITY_MONITORING_RESULTS_RAW
.For information about the event table columns, see Event table columns.
For a representative example to query the event table, see the logging and tracing tutorial.
- Option 2: Query the DATA_QUALITY_MONITORING_RESULTS view
This option enables you to query the DATA_QUALITY_MONITORING_RESULTS view, which flattens the raw data in the event table to enable easier access to the DMF results. Additionally, this option is best when data post-processing is not needed and when you don’t want to grant access to the raw data.
The view exists in the LOCAL schema in the shared SNOWFLAKE database:
SNOWFLAKE.LOCAL.DATA_QUALITY_MONITORING_RESULTS
.For information, see the DATA_QUALITY_MONITORING_RESULTS view.
Note
The SNOWFLAKE.GOVERNANCE_VIEWER database role does not have access to query the DATA_QUALITY_MONITORING_RESULTS view.
- Option 3: Call the DATA_QUALITY_MONITORING_RESULTS table function
This option enables you to call the DATA_QUALITY_MONITORING_RESULTS table function to view the DMF results. The function returns the same columns as the DATA_QUALITY_MONITORING_RESULTS view. However, you can only specify a single table when calling the function. This option is best when you want to limit data metric function results to a single table and not provide access to the measurements of other tables or the event table.
In addition to the SNOWFLAKE.DATA_QUALITY_MONITORING_LOOKUP application role, the role used to call the DATA_QUALITY_MONITORING_RESULTS table function also needs the following privileges:
OWNERSHIP or SELECT on the table.
OWNERSHIP or USAGE on the DMF.
View the properties of a DMF¶
Describe the DMF to view its properties:
DESC FUNCTION governance.dmfs.count_positive_numbers(
TABLE(
NUMBER, NUMBER, NUMBER
)
);
+-----------+---------------------------------------------------------------------+
| property | value |
+-----------+---------------------------------------------------------------------+
| signature | (ARG_T TABLE(ARG_C1 NUMBER, ARG_C2 NUMBER, ARG_C3 NUMBER)) |
| returns | NUMBER(38,0) |
| language | SQL |
| body | SELECT COUNT(*) FROM arg_t WHERE arg_c1>0 AND arg_c2>0 AND arg_c3>0 |
+-----------+---------------------------------------------------------------------+
Secure the DMF¶
You can use the ALTER FUNCTION command to make a DMF secure. For more information about what it means for a function to be secure, see Protecting Sensitive Information with Secure UDFs and Stored Procedures.
ALTER FUNCTION governance.dmfs.count_positive_numbers(
TABLE(
NUMBER,
NUMBER,
NUMBER
))
SET SECURE;
List your DMFs¶
Use the SHOW DATA METRIC FUNCTIONS or SHOW FUNCTIONS command to list DMFs in your account, database, or schema:
SHOW DATA METRIC FUNCTIONS IN ACCOUNT;
Alternatively, you can query the Information Schema FUNCTIONS view or the Account Usage FUNCTIONS view to list your DMFs in the specified database or your account.
The is_data_metric
column specifies whether the function is a DMF.
Set a tag on a DMF¶
Use the ALTER FUNCTION command to set a tag on a DMF:
ALTER FUNCTION governance.dmfs.count_positive_numbers(
TABLE(NUMBER, NUMBER, NUMBER))
SET TAG governance.tags.quality = 'counts';
Track DMF references¶
You can query the DATA_METRIC_FUNCTION_REFERENCES Account Usage view to view DMF references in your account. You can also call the DATA_METRIC_FUNCTION_REFERENCES Information Schema table function to identify the DMFs associated with a given table or view or the tables or views associated with a given DMF.
For example:
Return a row for each object (table or view) that has the DMF named count_positive_numbers
set on that table or
view:
USE DATABASE governance; USE SCHEMA INFORMATION_SCHEMA; SELECT * FROM TABLE( INFORMATION_SCHEMA.DATA_METRIC_FUNCTION_REFERENCES( METRIC_NAME => 'governance.dmfs.count_positive_numbers' ) );
Return a row for each DMF assigned to the table named hr.tables.empl_info
:
USE DATABASE governance; USE SCHEMA INFORMATION_SCHEMA; SELECT * FROM TABLE( INFORMATION_SCHEMA.DATA_METRIC_FUNCTION_REFERENCES( REF_ENTITY_NAME => 'hr.tables.empl_info', REF_ENTITY_DOMAIN => 'table' ) );
Drop the DMF from the system¶
Drop the DMF from the system:
DROP FUNCTION governance.dmfs.count_positive_numbers(
TABLE(
NUMBER, NUMBER, NUMBER
)
);
Limitations¶
For information, see the section on data quality limitations.