Apply data quality checks by group¶
When you associate a data metric function (DMF) with a table or view, the DMF evaluates your data and returns a single scalar value for the entire column or table, which is useful for tracking data health at a global level. Some use cases require the same metric broken down by a dimension, such as null counts per region or duplicate counts per product category.
The WITHIN GROUP clause lets you specify one or more columns to segment data when you create a DMF association, so each group gets its own metric result.
Overview¶
When you create a DMF association with a WITHIN GROUP clause, the DMF is evaluated separately for each distinct combination of values in the grouping columns. Results are written as one row per group to the data quality monitoring results view, which makes it easier to identify quality issues within specific segments.
For example, if your customer_data table has data for multiple regions and you want to track
null counts per region, you can group the NULL_COUNT DMF by the region column. Instead of
getting a single count across all regions, you get one row per region in the results.
Supported DMFs¶
The WITHIN GROUP clause is supported for most system DMFs in the SNOWFLAKE.CORE schema (including NULL_COUNT, DUPLICATE_COUNT, and ROW_COUNT) and for most custom DMFs.
The following system DMFs aren’t supported with WITHIN GROUP:
FRESHNESS, because it operates on the table as a whole and doesn’t take column arguments.
REFERENTIAL_INTEGRITY_COUNT, because it joins across two tables and can’t be evaluated within a single group.
Note
Some metrics aren’t additive across groups. For example, AVG and STDDEV per group can’t be combined to derive a table-level equivalent. Keep this in mind when interpreting grouped results.
Custom DMF compatibility¶
Custom DMFs are supported when the DMF body uses a straightforward SQL structure. The following table summarizes compatibility by structure type:
DMF body structure |
Supported |
|---|---|
Single table query |
Yes |
Subquery |
Yes |
FLATTEN |
Yes |
JOIN |
No |
Common table expressions (CTEs) |
No |
UNION or UNION ALL |
No |
DISTINCT |
No |
Window functions |
No |
If you try to use WITHIN GROUP with an unsupported custom DMF structure, an error is returned at association creation time.
Tip
If your custom DMF uses a CTE, UNION, JOIN, DISTINCT, or window function pattern, consider rewriting it as a subquery, or migrate to an equivalent system DMF where possible.
Create an association with grouping¶
To add a DMF association with group-level results, use the ALTER TABLE command with the WITHIN GROUP clause. The clause is added to the same ADD DATA METRIC FUNCTION syntax that you use for non-grouped associations, so all of the existing association properties (such as EXPECTATION and EXECUTE AS ROLE) remain available:
The GROUP LIMIT clause is two keywords without an equals sign, and it must follow the
WITHIN GROUP clause directly.
For the full set of clauses supported by ADD DATA METRIC FUNCTION (including ANOMALY_DETECTION, SENSITIVITY, and chaining additional DMFs in a single statement), see Data metric function actions (dataMetricFunctionAction).
Parameters specific to grouping¶
Parameter |
Description |
|---|---|
|
One or more columns to group by. Each distinct combination of values in these columns produces a separate result row. |
|
Optional. The maximum number of groups allowed per evaluation. Valid values are 1 through 1000 (default: 1000). If the number of distinct groups at evaluation time exceeds this limit, the evaluation fails with an error. |
Note
You can only have one association per DMF, table, and column combination. If an association already exists for a given metric and column combination, you can’t create a second one with a different grouping.
ANOMALY_DETECTION is automatically disabled when a WITHIN GROUP clause is present. For details, see Limitations.
Examples¶
The following example tracks null counts in the name column of the customer_data table,
broken down by region:
To also group by product category and allow up to 500 groups:
Modify or drop a grouped association¶
You can use the existing ALTER TABLE ... MODIFY ... command to update other properties of a
grouped association, such as suspending or resuming it, or changing the expectation. The WITHIN
GROUP clause is set at association creation time and can’t be modified after the fact.
To change the grouping on an existing association, drop the association and re-create it.
To drop a grouped association, use the standard drop syntax. No changes are needed:
View grouped results¶
Grouped DMF results are available in the same views as standard DMF results. When an association uses WITHIN GROUP, each evaluation produces one result row per group.
DATA_QUALITY_MONITORING_RESULTS view¶
Use the DATA_QUALITY_MONITORING_RESULTS view view to query evaluation
results. Grouped results include a new GROUP_BY_INFO column that identifies the group each row
corresponds to.
Column |
Type |
Description |
|---|---|---|
|
ARRAY |
An array of objects, each describing one grouping column and its value for this row. Each
object contains |
Example: Query null counts by region
Example output:
Example: Filter results for a specific group
Example: Compare results across groups from the same evaluation
All rows from a single evaluation share the same reference_id (the association ID) and
measurement_time. Use both columns to scope results to one evaluation run:
View grouping configuration¶
DATA_METRIC_FUNCTION_REFERENCES table function¶
The DATA_METRIC_FUNCTION_REFERENCES table function exposes
grouping details for each association through its PROPERTIES VARIANT column. The two keys
that are specific to WITHIN GROUP are:
Field |
Description |
|---|---|
|
A JSON-encoded string representing the array of column references used in the WITHIN GROUP
clause, in the format |
|
The maximum group limit for this association, or NULL if no grouping is configured. |
For the full list of keys included in the PROPERTIES column, see
DATA_METRIC_FUNCTION_REFERENCES.
For example:
Example output:
The DATA_METRIC_FUNCTION_REFERENCES view view exposes the same
PROPERTIES column.
Extract error rows for a specific group¶
When a grouped DMF flags a quality issue, you might want to inspect the raw rows that contributed
to a specific group’s result. The SYSTEM$DATA_METRIC_SCAN function
accepts an optional WITHIN_GROUP_VALUES argument that filters the scan to a specific group:
Arguments¶
Argument |
Description |
|---|---|
|
A JSON object that maps grouping column names to the specific group values you want to scan.
For example: |
Example: View null rows in the US region
This returns all rows where name IS NULL and region = 'US'.
Use expectations with grouped associations¶
Expectations work with grouped DMF associations. When an expectation is defined on a grouped
association, it’s evaluated independently for each group, and a per-group result is written to
the event table. To inspect per-group expectation results, query the
DATA_QUALITY_MONITORING_EXPECTATION_STATUS view view, which includes a
GROUP_BY_INFO column that identifies the group for each row.
Notifications behave differently from per-group results. For a single evaluation, Snowflake fires at most one notification, based on the worst-group value (the maximum metric value across all groups). If at least one group violates the expectation, you receive one notification for that evaluation. Per-group violation details are still recorded in the event table for follow-up analysis.
To evaluate expectations on demand for a specific group, use
SYSTEM$EVALUATE_DATA_QUALITY_EXPECTATIONS with the
WITHIN_GROUP_VALUES argument, using the same JSON object format as
SYSTEM$DATA_METRIC_SCAN.
For more information about expectations, see Use SQL to work with expectations.
Limitations¶
Custom DMF compatibility. The WITHIN GROUP clause isn’t supported for custom DMFs that use CTEs,
UNION,UNION ALL,JOIN,DISTINCT, or window functions. For details, see Supported DMFs.Schema-level associations. WITHIN GROUP isn’t supported on schema-level DMF associations (
ALTER SCHEMA ... ADD DATA METRIC FUNCTION). To group by a column, create the association at the table or view level.One association per DMF, table, and column combination. Because only one association is allowed per metric, table, and column combination, you can only have one grouping configuration per combination.
Group limit. Valid values for
GROUP LIMITare 1 through 1000, with a default of 1000. If the number of distinct groups at evaluation time exceeds the configured limit, the evaluation fails and no results are written for that run. Reduce cardinality in your grouping column, or raiseGROUP LIMITup to the maximum of 1000 to avoid this error.Immutable grouping configuration. The grouping columns and group limit can’t be changed after an association is created. Drop and re-create the association to change the grouping.
Anomaly detection. The per-association
ANOMALY_DETECTIONproperty and the system anomaly DMFs in theSNOWFLAKE.COREschema (such asROW_COUNTandFRESHNESSwith anomaly detection enabled) aren’t supported with WITHIN GROUP. TheANOMALY_DETECTIONproperty is automatically disabled when a WITHIN GROUP clause is present. Full anomaly detection support for groups is planned for a future release.