Remediation of data quality issues¶
Data quality metrics (DMFs) let you identify how many records in a table might contain quality issues. For example, the SNOWFLAKE.CORE.NULL_COUNT can identify how many records contain a NULL value in a specific column.
To help you fix these possible quality issues, you can call the SYSTEM$DATA_METRIC_SCAN system function to return the individual records identified by the DMF as containing data that failed a data quality check. For example, if you pass the NULL_COUNT DMF into the SYSTEM$DATA_METRIC_SCAN function as an argument, then you can obtain the actual records that contain a NULL value, not just the number of records that contain a NULL value.
Supported DMFs¶
The SYSTEM$DATA_METRIC_SCAN function accepts a DMF as an argument to return the records identified by the DMF as containing problematic data. The following system DMFs can be used as arguments:
SNOWFLAKE.CORE.NULL_COUNT
SNOWFLAKE.CORE.NULL_PERCENT
SNOWFLAKE.CORE.BLANK_COUNT
SNOWFLAKE.CORE.BLANK_PERCENT
SNOWFLAKE.CORE.DUPLICATE_COUNT
Limitations and considerations¶
You cannot use custom DMFs as an argument of the SYSTEM$DATA_METRIC_SCAN function.
If a table is protected by a policy, such as a masking policy or row access policy, the SYSTEM$DATA_METRIC_SCAN function might return unexpected or incomplete data because results depend on the user’s role when executing the function.
Calling the SYSTEM$DATA_METRIC_SCAN function¶
When you call the SYSTEM$DATA_METRIC_SCAN function, it analyses a table with a DMF to identify possible data quality issues. You must pass in the following arguments to the SYSTEM$DATA_METRIC_SCAN function: the name of the table, the DMF, and any arguments being passed to the DMF to help identify problematic records.
For example, given that the SNOWFLAKE.CORE.NULL_COUNT system metric function returns the total number of NULL values in a particular column, the following returns
the rows of the employeesTable
table that have NULL values in the SSN
column in the version of the table that existed at a specific time in the past.
SELECT *
FROM TABLE(SYSTEM$DATA_METRIC_SCAN(
REF_ENTITY_NAME => '<governance.sch.employeesTable>'
,METRIC_NAME => '<snowflake.core.null_count>'
,ARGUMENT_NAME => '<SSN>'
,AT_TIMESTAMP => '<2024-08-28 02:00:00 -0700>'
));
To check the results of a DMF evaluation on the table or view in the past, you can pass the AT_TIMESTAMP argument. The AT_TIMESTAMP argument allows you to use Time Travel to cast the timestamp string to return only those records that existed in the table at the ‘2024-08-28 02:00:00 -0700’ timestamp.
Using SYSTEM$DATA_METRIC_SCAN to fix data¶
The SYSTEM$DATA_METRIC_SCAN function is a table function that returns a set of rows. The output of the function can be used within a DML statement to take action on the records that have been identified as containing data that failed a data quality check.
Suppose you want to replace blank values in the email
column of the t
table with NULL values. Because the BLANK_COUNT data metric function identifies blank values, you could run the following statement:
UPDATE T
SET email = null
WHERE T.ID IN (SELECT ID FROM TABLE(SYSTEM$DATA_METRIC_SCAN(
REF_ENTITY_NAME => '<t>'
,METRIC_NAME => '<snowflake.core.blank_count>'
,ARGUMENT_NAME => '<email>'
)));