Working with data metric functions

Before trying each of the examples in this topic, be sure your role in use meets the necessary access control requirements.

Measure now: system DMF example

Use the following syntax to call a DMF:

SELECT <data_metric_function>(<query>)
Copy

Where:

data_metric_function

Specifies a system- or user-defined DMF.

query

Specifies a SQL query on a table or view.

The columns in the query must match the column arguments in the DMF signature.

Note

These system DMFs don’t follow this syntax because they don’t take any arguments:

Call a system DMF to start measuring your data quality immediately.

For example, call the NULL_COUNT system DMF to view the number of NULL values in the SSN column (that is, US Social Security number):

USE ROLE data_engineer;
SELECT SNOWFLAKE.CORE.NULL_COUNT(
  SELECT ssn
  FROM hr.tables.empl_info
);
Copy

You can query the event table to view the results. For information, see View the DMF results.

Create your own DMF

Use the CREATE DATA METRIC FUNCTION command to create your own DMFs.

Create a DMF that calls the COUNT function to return the total number of rows that don’t contain NULL values when evaluating the first three columns in the table:

CREATE OR REPLACE DATA METRIC FUNCTION governance.dmfs.count_positive_numbers(
  arg_t TABLE(
    arg_c1 NUMBER,
    arg_c2 NUMBER,
    arg_c3 NUMBER
  )
)
RETURNS NUMBER
AS
$$
  SELECT
    COUNT(*)
  FROM arg_t
  WHERE
    arg_c1>0
    AND arg_c2>0
    AND arg_c3>0
$$;
Copy

For an example that references a table other than the table to which you assign the DMF, see Example: Foreign key reference.

Note

If you need to view the DMF definition, use the DESCRIBE FUNCTION (DMF) command.

View the properties of a DMF

Describe the DMF to view its properties:

DESC FUNCTION governance.dmfs.count_positive_numbers(
  TABLE(
    NUMBER, NUMBER, NUMBER
  )
);
Copy
+-----------+---------------------------------------------------------------------+
| property  | value                                                               |
+-----------+---------------------------------------------------------------------+
| signature | (ARG_T TABLE(ARG_C1 NUMBER, ARG_C2 NUMBER, ARG_C3 NUMBER))          |
| returns   | NUMBER(38,0)                                                        |
| language  | SQL                                                                 |
| body      | SELECT COUNT(*) FROM arg_t WHERE arg_c1>0 AND arg_c2>0 AND arg_c3>0 |
+-----------+---------------------------------------------------------------------+

Secure the DMF

Make the DMF a secure DMF:

ALTER FUNCTION governance.dmfs.count_positive_numbers(
arg_t TABLE(
  arg_c1 NUMBER,
  arg_c2 NUMBER,
  arg_c3 NUMBER
))
SET SECURE;
Copy

List your DMFs

Use the SHOW DATA METRIC FUNCTIONS or SHOW FUNCTIONS command to list DMFs in your account, database, or schema:

SHOW DATA METRIC FUNCTIONS IN ACCOUNT;
Copy

The is_data_metric column specifies whether the function is a DMF, with either Y or N (yes or no) as possible column values.

Set a tag on a DMF

Use the ALTER FUNCTION command to set a tag on a DMF:

ALTER FUNCTION governance.dmfs.count_positive_numbers(
  TABLE(NUMBER, NUMBER, NUMBER))
  SET TAG governance.tags.quality = 'counts';
Copy

Add or drop a DMF on a table or view

Use an ALTER TABLE or ALTER VIEW command to do the following:

  • Add a DMF to a table using an ALTER TABLE.

    ALTER TABLE t
      ADD DATA METRIC FUNCTION governance.dmfs.count_positive_numbers
      ON (c1, c2, c3);
    
    Copy
  • Drop a DMF from a table using an ALTER TABLE command.

    ALTER TABLE t
      DROP DATA METRIC FUNCTION governance.dmfs.count_positive_numbers
      ON (c1, c2, c3);
    
    Copy

Track DMF references

You can query the DATA_METRIC_FUNCTION_REFERENCES Account Usage view to view DMF references in your account, or call the DATA_METRIC_FUNCTION_REFERENCES Information Schema table function to identify the DMFs associated with a given table or view or the tables or views associated with a given DMF.

For example:

Return a row for each object (table or view) that has the DMF named count_positive_numbers set on that table or view:

USE DATABASE governance;
USE SCHEMA INFORMATION_SCHEMA;
SELECT *
  FROM TABLE(
    INFORMATION_SCHEMA.DATA_METRIC_FUNCTION_REFERENCES(
      METRIC_NAME => 'governance.dmfs.count_positive_numbers'
    )
  );
Copy

Return a row for each DMF assigned to the table named hr.tables.empl_info:

USE DATABASE governance;
USE SCHEMA INFORMATION_SCHEMA;
SELECT *
  FROM TABLE(
    INFORMATION_SCHEMA.DATA_METRIC_FUNCTION_REFERENCES(
      REF_ENTITY_NAME => 'hr.tables.empl_info',
      REF_ENTITY_DOMAIN => 'table'
    )
  );
Copy

Schedule your DMFs to run

You can schedule your DMFs to run to automate the data quality measurements on a table.

Snowflake uses the table owner role, which is the role with the OWNERSHIP privilege on the table, to call the scheduled DMF. The table owner role must have these grants:

  • The USAGE privilege on the user-defined DMF, and the USAGE privilege on the database and schema that store the user-defined DMF.

  • The DATA_METRIC_USER database role if the DMF is a system DMF.

  • The EXECUTE DATA METRIC FUNCTION privilege on the account.

The DATA_METRIC_SCHEDULE object parameter for a table or view enables you to schedule when your DMFs can run. All data metric functions on the table or view follow the same schedule.

There are three approaches to schedule your DMF to run:

  • Set the DMF to run after a specified number of minutes.

  • Use a cron expression to schedule the DMF to run at a particular frequency.

  • Use a trigger event to schedule the DMF to run when there is a DML change to the table, such as

    inserting a new row into the table. However, the reclustering of tables doesn’t trigger a DMF to run.

You can set the DATA_METRIC_SCHEDULE parameter with an ALTER TABLE or ALTER VIEW command. For example:

Set the data metric function schedule to run every 5 minutes:

ALTER TABLE hr.tables.empl_info SET
  DATA_METRIC_SCHEDULE = '5 MINUTE';
Copy

Set the data metric function schedule to run at 8:00 AM daily:

ALTER TABLE hr.tables.empl_info SET
  DATA_METRIC_SCHEDULE = 'USING CRON 0 8 * * * UTC';
Copy

Set the data metric function schedule to run at 8:00 AM on weekdays only:

ALTER TABLE hr.tables.empl_info SET
  DATA_METRIC_SCHEDULE = 'USING CRON 0 8 * * MON,TUE,WED,THU,FRI UTC';
Copy

Set the data metric function schedule to run three times daily at 0600, 1200, and 1800 UTC:

ALTER TABLE hr.tables.empl_info SET
  DATA_METRIC_SCHEDULE = 'USING CRON 0 6,12,18 * * * UTC';
Copy

Set the data metric function to run when a general DML operation, such as inserting a new row, modifies the table:

ALTER TABLE hr.tables.empl_info SET
  DATA_METRIC_SCHEDULE = 'TRIGGER_ON_CHANGES';
Copy

You can use the SHOW PARAMETERS command to view the DMF schedule for a given table or view. For example:

SHOW PARAMETERS LIKE 'DATA_METRIC_SCHEDULE' IN TABLE hr.tables.empl_info;
Copy
+----------------------+--------------------------------+---------+-------+------------------------------------------------------------------------------------------------------------------------------+--------+
| key                  | value                          | default | level | description                                                                                                                  | type   |
+----------------------+--------------------------------+---------+-------+------------------------------------------------------------------------------------------------------------------------------+--------+
| DATA_METRIC_SCHEDULE | USING CRON 0 6,12,18 * * * UTC |         | TABLE | Specify the schedule that data metric functions associated to the table must be executed in order to be used for evaluation. | STRING |
+----------------------+--------------------------------+---------+-------+------------------------------------------------------------------------------------------------------------------------------+--------+

For view and materialized view objects, specify TABLE as the object domain and check the schedule as follows:

SHOW PARAMETERS LIKE 'DATA_METRIC_SCHEDULE' IN TABLE mydb.public.my_view;
Copy

Call your DMF manually

Use the following syntax to call a DMF:

SELECT <data_metric_function>(<query>)
Copy

Where:

data_metric_function

Specifies a system- or user-defined DMF.

query

Specifies a SQL query on a table or view.

The columns in the query must match the column arguments in the DMF signature.

Note

These system DMFs don’t follow this syntax because they don’t take any arguments:

Call the DMF directly and view the results:

SELECT governance.dmfs.count_positive_numbers(
    SELECT c1,
           c2,
           c3
    FROM t
  );
Copy

When you set a DMF on a table or view and schedule it to run, you can query the view built from the event table to view the results. For information, see View the DMF results.

View the DMF results

Snowflake maintains and stores the results of calling a DMF in a dedicated event table. To view the DMF results, choose one of the following options:

Depending on how you want to manage access to the DMF results, these options require either or both of the following system application roles to be granted to one or more account roles:

Application role

Description

DATA_QUALITY_MONITORING_ADMIN

Enables these operations:

DATA_QUALITY_MONITORING_VIEWER

Querying the view (option 2, only)

For example, use the GRANT APPLICATION ROLE command to grant the DATA_QUALITY_MONITORING_VIEWER system application role to the analyst custom role:

USE ROLE ACCOUNTADMIN;
GRANT APPLICATION ROLE SNOWFLAKE.DATA_QUALITY_MONITORING_VIEWER TO ROLE analyst;
Copy

Use the REVOKE APPLICATION ROLE command to revoke the system application role from an account role.

Option 1: Query the event table

This option gives you access to the raw data, and you have more freedom to post-process the data using derived objects, such as creating views, table functions, or stored procedures based on how you want to analyze the results. Additionally, if you create these derived objects, you can selectively grant access on these objects to different roles. For example, a data engineer can access the stored procedures to maintain the approach to obtain the results, and a data analyst can access the view to analyze the results.

The event table is named SNOWFLAKE.LOCAL.DATA_QUALITY_MONITORING_RESULTS_RAW.

For information about the event table columns, see Event table columns.

For a representative example to query the event table, see the logging and tracing tutorial.

Option 2: Query the flattened view that is built from the event table

This option enables you to query the DATA_QUALITY_MONITORING_RESULTS view, which flattens the raw data in the event table to enable easier access to the DMF results. Additionally, this option is best when data post-processing is not needed and when you don’t want to grant access to the raw data.

The view exists in the LOCAL schema in the shared SNOWFLAKE database: SNOWFLAKE.LOCAL.DATA_QUALITY_MONITORING_RESULTS.

For information, see the DATA_QUALITY_MONITORING_RESULTS view.

Note

The SNOWFLAKE.GOVERNANCE_VIEWER database role does not have access to query the DATA_QUALITY_MONITORING_RESULTS view.

Drop the DMF from the system

Drop the DMF from the system:

DROP FUNCTION governance.dmfs.count_positive_numbers(
  TABLE(
    NUMBER, NUMBER, NUMBER
  )
);
Copy

Limitations

For information, see the section on data quality limitations.