Working with data metric functions

Before trying each of the examples in this topic, be sure your role in use meets the necessary access control requirements.

Create your own DMF

If there isn’t a system data quality metric function (DMF) that can perform your data quality checks, then you can use the CREATE DATA METRIC FUNCTION command to create your own DMF.

Example: User-defined DMF with single table argument

Create a DMF that calls the COUNT function to return the total number of rows that have positive numbers in the first three columns of the table:

CREATE OR REPLACE DATA METRIC FUNCTION governance.dmfs.count_positive_numbers(
  arg_t TABLE(
    arg_c1 NUMBER,
    arg_c2 NUMBER,
    arg_c3 NUMBER
  )
)
RETURNS NUMBER
AS
$$
  SELECT
    COUNT(*)
  FROM arg_t
  WHERE
    arg_c1>0
    AND arg_c2>0
    AND arg_c3>0
$$;
Copy
Example: Using multiple table arguments to perform referential checks

A user-defined DMF can have more than one argument that accepts a table. When you add the DMF to a table, that table is used as the first argument. If there is an additional argument that accepts a table, you must also specify the fully qualified name of the second table. This capability simplifies referential integrity, matching and comparison, or conditional checking across different datasets.

Suppose you want to validate referential integrity as defined by a primary key/foreign key relationship. In this case, you can create a DMF to validate that all records in a source table have corresponding records in the referenced table. The following user-defined DMF returns the number of records where the value of a column in one table does not have a corresponding value in the column of another table:

CREATE OR REPLACE DATA METRIC FUNCTION governance.dmfs.referential_check(
  arg_t1 TABLE (arg_c1 INT), arg_t2 TABLE (arg_c2 INT))
RETURNS NUMBER AS
 'SELECT COUNT(*) FROM arg_t1
  WHERE arg_c1 NOT IN (SELECT arg_c2 FROM arg_t2)';
Copy

Now suppose you want to check whether every order, as identified by its sp_id, in the salesorders table maps back to an sp_id in the salespeople table. You can add the DMF to the salesorders table while specifying the salespeople table as the other table argument.

ALTER TABLE salesorders
  ADD DATA METRIC FUNCTION governance.dmfs.referential_check
    ON (sp_id, TABLE (my_db.sch1.salespeople(sp_id)));
Copy

The output returns the number of rows in the salesorders table that have a value in the sp_id column that doesn’t appear in the sp_id column of the salespeople table. A value greater than 0 indicates that there are sp_id values in salesorders that don’t map to records in salespeople.

Call your DMF manually

Use the following syntax to call a DMF:

SELECT <data_metric_function>(<query>)
Copy

Where:

data_metric_function

Specifies a system- or user-defined DMF.

query

Specifies a SQL query on a table or view.

The columns in the query must match the column arguments in the DMF signature.

Note

These system DMFs don’t follow this syntax because they don’t take any arguments:

Call the DMF directly and view the results:

SELECT governance.dmfs.count_positive_numbers(
    SELECT c1,
           c2,
           c3
    FROM t
  );
Copy

When you set a DMF on a table or view and schedule it to run, you can query the view built from the event table to view the results. For information, see View the DMF results.

Add or drop a DMF on a table or view

Use an ALTER TABLE or ALTER VIEW command to do the following:

  • Add a DMF to a table using an ALTER TABLE.

    ALTER TABLE t
      ADD DATA METRIC FUNCTION governance.dmfs.count_positive_numbers
        ON (c1, c2, c3);
    
    Copy
  • Drop a DMF from a table using an ALTER TABLE command.

    ALTER TABLE t
      DROP DATA METRIC FUNCTION governance.dmfs.count_positive_numbers
        ON (c1, c2, c3);
    
    Copy

Note

Use the ALTER VIEW command to add a DMF on a materialized view column or drop a DMF from a materialized view column.

Schedule your DMFs to run

You can schedule your DMFs to run to automate the data quality measurements on a table.

Snowflake uses the table owner role, which is the role with the OWNERSHIP privilege on the table, to call the scheduled DMF. The table owner role must have these grants:

  • The USAGE privilege on the user-defined DMF, and the USAGE privilege on the database and schema that store the user-defined DMF.

  • The DATA_METRIC_USER database role if the DMF is a system DMF.

  • The EXECUTE DATA METRIC FUNCTION privilege on the account.

The DATA_METRIC_SCHEDULE object parameter for a table, view, or materialized view enables you to schedule when your DMFs can run. All data metric functions on the table or view follow the same schedule.

There are three approaches to schedule your DMF to run:

  • Set the DMF to run after a specified number of minutes.

  • Use a cron expression to schedule the DMF to run at a particular frequency.

  • Use a trigger event to schedule the DMF to run when there is a DML change to the table, such as inserting a new row into the table. However:

For example:

Set the data metric function schedule to run every 5 minutes:

ALTER TABLE hr.tables.empl_info SET
  DATA_METRIC_SCHEDULE = '5 MINUTE';
Copy

Set the data metric function schedule to run at 8:00 AM daily:

ALTER TABLE hr.tables.empl_info SET
  DATA_METRIC_SCHEDULE = 'USING CRON 0 8 * * * UTC';
Copy

Set the data metric function schedule to run at 8:00 AM on weekdays only:

ALTER TABLE hr.tables.empl_info SET
  DATA_METRIC_SCHEDULE = 'USING CRON 0 8 * * MON,TUE,WED,THU,FRI UTC';
Copy

Set the data metric function schedule to run three times daily at 0600, 1200, and 1800 UTC:

ALTER TABLE hr.tables.empl_info SET
  DATA_METRIC_SCHEDULE = 'USING CRON 0 6,12,18 * * * UTC';
Copy

Set the data metric function to run when a general DML operation, such as inserting a new row, modifies the table:

ALTER TABLE hr.tables.empl_info SET
  DATA_METRIC_SCHEDULE = 'TRIGGER_ON_CHANGES';
Copy

You can use the SHOW PARAMETERS command to view the DMF schedule for a supported table object:

SHOW PARAMETERS LIKE 'DATA_METRIC_SCHEDULE' IN TABLE hr.tables.empl_info;
Copy
+----------------------+--------------------------------+---------+-------+------------------------------------------------------------------------------------------------------------------------------+--------+
| key                  | value                          | default | level | description                                                                                                                  | type   |
+----------------------+--------------------------------+---------+-------+------------------------------------------------------------------------------------------------------------------------------+--------+
| DATA_METRIC_SCHEDULE | USING CRON 0 6,12,18 * * * UTC |         | TABLE | Specify the schedule that data metric functions associated to the table must be executed in order to be used for evaluation. | STRING |
+----------------------+--------------------------------+---------+-------+------------------------------------------------------------------------------------------------------------------------------+--------+

For view and materialized view objects, specify TABLE as the object domain and check the schedule as follows:

SHOW PARAMETERS LIKE 'DATA_METRIC_SCHEDULE' IN TABLE mydb.public.my_view;
Copy

Note

There is a 10-minute lag from when you modify the DMF from a table for any scheduling changes to take effect on previous DMFs that are assigned to the table. However, new DMF assignments to the table are not subject to the 10 minute delay. Plan the DMF scheduling and DMF unsetting operations carefully to align with your expected DMF costs.

Additionally, when you evaluate the DMF results, such as by querying the DATA_QUALITY_MONITORING_RESULTS view, specify the measurement_time column in your query as the basis for the evaluation. There is an internal process that initiates the DMF evaluation, and it is possible that table updates, such as INSERT operations, can occur between the scheduled time and the measurement time. When you use the measurement_time column, you have a more accurate assessment of the DMF results because the measurement time indicates the evaluation time of the DMF.

View the DMF results

To view DMF results, you first need to manage access to the results and then choose how to view the results:

Managing access to the DMF results

You have three options to access the results of a scheduled DMF, which are detailed in the next section:

Depending on how you want to manage access to the DMF results, grant the following system application roles to another account-level role:

Application role

Notes

DATA_QUALITY_MONITORING_ADMIN

Can use options 1, 2, and 3.

A role that is granted this application role can grant the DATA_QUALITY_MONITORING_VIEWER and DATA_QUALITY_MONITORING_LOOKUP application roles to other roles.

A user that is granted the ACCOUNTADMIN role can grant the DATA_QUALITY_MONITORING_ADMIN application role to other roles.

DATA_QUALITY_MONITORING_VIEWER

Options 2 and 3.

DATA_QUALITY_MONITORING_LOOKUP

Option 3 only.

For example, use the GRANT APPLICATION ROLE command to grant the DATA_QUALITY_MONITORING_VIEWER system application role to the analyst custom role:

USE ROLE ACCOUNTADMIN;
GRANT APPLICATION ROLE SNOWFLAKE.DATA_QUALITY_MONITORING_VIEWER TO ROLE analyst;
Copy

Use the REVOKE APPLICATION ROLE command to revoke the system application role from an account role.

Tip

Additionally, if you want to manage access to the results of system DMFs, grant the SNOWFLAKE.DATA_METRIC_USER database role to the account role that can access the results. For more information, see GRANT DATABASE ROLE.

Options

To view the scheduled DMF results, choose one of the following options:

Option 1: Query the dedicated event table

This option gives you access to the raw data, and you have more freedom to post-process the data using derived objects, such as creating views, table functions, or stored procedures based on how you want to analyze the results. Additionally, if you create these derived objects, you can selectively grant access on these objects to different roles. For example, a data engineer can access the stored procedures to maintain the approach to obtain the results, and a data analyst can access the view to analyze the results.

The event table is named SNOWFLAKE.LOCAL.DATA_QUALITY_MONITORING_RESULTS_RAW.

For information about the event table columns, see Event table columns.

For a representative example to query the event table, see the logging and tracing tutorial.

Option 2: Query the DATA_QUALITY_MONITORING_RESULTS view

This option enables you to query the DATA_QUALITY_MONITORING_RESULTS view, which flattens the raw data in the event table to enable easier access to the DMF results. Additionally, this option is best when data post-processing is not needed and when you don’t want to grant access to the raw data.

The view exists in the LOCAL schema in the shared SNOWFLAKE database: SNOWFLAKE.LOCAL.DATA_QUALITY_MONITORING_RESULTS.

For information, see the DATA_QUALITY_MONITORING_RESULTS view.

Note

The SNOWFLAKE.GOVERNANCE_VIEWER database role does not have access to query the DATA_QUALITY_MONITORING_RESULTS view.

Option 3: Call the DATA_QUALITY_MONITORING_RESULTS table function

This option enables you to call the DATA_QUALITY_MONITORING_RESULTS table function to view the DMF results. The function returns the same columns as the DATA_QUALITY_MONITORING_RESULTS view. However, you can only specify a single table when calling the function. This option is best when you want to limit data metric function results to a single table and not provide access to the measurements of other tables or the event table.

In addition to the SNOWFLAKE.DATA_QUALITY_MONITORING_LOOKUP application role, the role used to call the DATA_QUALITY_MONITORING_RESULTS table function also needs the following privileges:

  • OWNERSHIP or SELECT on the table.

  • OWNERSHIP or USAGE on the DMF.

View the properties of a DMF

Describe the DMF to view its properties:

DESC FUNCTION governance.dmfs.count_positive_numbers(
  TABLE(
    NUMBER, NUMBER, NUMBER
  )
);
Copy
+-----------+---------------------------------------------------------------------+
| property  | value                                                               |
+-----------+---------------------------------------------------------------------+
| signature | (ARG_T TABLE(ARG_C1 NUMBER, ARG_C2 NUMBER, ARG_C3 NUMBER))          |
| returns   | NUMBER(38,0)                                                        |
| language  | SQL                                                                 |
| body      | SELECT COUNT(*) FROM arg_t WHERE arg_c1>0 AND arg_c2>0 AND arg_c3>0 |
+-----------+---------------------------------------------------------------------+

Secure the DMF

You can use the ALTER FUNCTION command to make a DMF secure. For more information about what it means for a function to be secure, see Protecting Sensitive Information with Secure UDFs and Stored Procedures.

ALTER FUNCTION governance.dmfs.count_positive_numbers(
 TABLE(
   NUMBER,
   NUMBER,
   NUMBER
))
SET SECURE;
Copy

List your DMFs

Use the SHOW DATA METRIC FUNCTIONS or SHOW FUNCTIONS command to list DMFs in your account, database, or schema:

SHOW DATA METRIC FUNCTIONS IN ACCOUNT;
Copy

Alternatively, you can query the Information Schema FUNCTIONS view or the Account Usage FUNCTIONS view to list your DMFs in the specified database or your account.

The is_data_metric column specifies whether the function is a DMF.

Set a tag on a DMF

Use the ALTER FUNCTION command to set a tag on a DMF:

ALTER FUNCTION governance.dmfs.count_positive_numbers(
  TABLE(NUMBER, NUMBER, NUMBER))
  SET TAG governance.tags.quality = 'counts';
Copy

Track DMF references

You can query the DATA_METRIC_FUNCTION_REFERENCES Account Usage view to view DMF references in your account. You can also call the DATA_METRIC_FUNCTION_REFERENCES Information Schema table function to identify the DMFs associated with a given table or view or the tables or views associated with a given DMF.

For example:

Return a row for each object (table or view) that has the DMF named count_positive_numbers set on that table or view:

USE DATABASE governance;
USE SCHEMA INFORMATION_SCHEMA;
SELECT *
  FROM TABLE(
    INFORMATION_SCHEMA.DATA_METRIC_FUNCTION_REFERENCES(
      METRIC_NAME => 'governance.dmfs.count_positive_numbers'
    )
  );
Copy

Return a row for each DMF assigned to the table named hr.tables.empl_info:

USE DATABASE governance;
USE SCHEMA INFORMATION_SCHEMA;
SELECT *
  FROM TABLE(
    INFORMATION_SCHEMA.DATA_METRIC_FUNCTION_REFERENCES(
      REF_ENTITY_NAME => 'hr.tables.empl_info',
      REF_ENTITY_DOMAIN => 'table'
    )
  );
Copy

Drop the DMF from the system

Drop the DMF from the system:

DROP FUNCTION governance.dmfs.count_positive_numbers(
  TABLE(
    NUMBER, NUMBER, NUMBER
  )
);
Copy

Limitations

For information, see the section on data quality limitations.