Implementing entity-level privacy with aggregation policies

Entity-level privacy strengthens the privacy protections provided by aggregation policies. With entity-level privacy, Snowflake can ensure that an aggregation group contains a certain number of entities, not just a certain number of rows.

The majority of tasks and considerations related to aggregation policies are the same regardless of whether you are implementing entity-level privacy. For general information about working with aggregation policies, see Aggregation policies.

About entity-level privacy

An entity refers to a set of attributes that belong to a logical object (for example, a user profile or household information). These attributes can be used to identify an entity within a dataset. Entity-level privacy is a feature of privacy-enhancing technologies (PET) that protects the privacy of an entity that is stored in a shared dataset. It ensures that queries cannot expose sensitive attributes of an entity, even if those attributes are found in multiple records.

To achieve entity-level privacy, Snowflake allows you to specify which attributes can be used to identify an entity (an entity key). This lets Snowflake identify all of the records that belong to a particular entity within a dataset. For example, if the entity key is defined as the column email, then Snowflake can determine that all records where email=joe.smith@example.com belong to the same entity.

Aggregation policies without entity-level privacy

By default, aggregation policies require analysts to run queries that aggregate data rather than retrieving individual rows, thereby achieving row-level privacy. However, row-level privacy does not prevent a query from exposing attributes of an entity when those attributes are found in multiple rows (for example, in a table containing transactional data).

For example, suppose a streaming service, ActonViz, has a transactional table that contains the email address (user_id) and household (household_id) of each viewer as they watch shows.

user_id

household_id

program_id

watch_time

start_time

dave_sr@company.com

12345

1

29

2023-09-12 09:00

mary@bazco.com

23485

1

30

2023-09-12 09:00

dave_sr@company.com

12345

6

18

2023-09-11 13:00

joe@jupiterlink.com

85456

6

25

2023-09-15 22:00

junior@example.com

12345

5

30

2023-09-13 11:00

ActonViz can use an aggregation policy to force the advertisers to aggregate data into groups that contain at least 2 records. This prevents the advertisers from retrieving data from an individual record (row-level privacy). If each viewer and household only appeared once in the table, that would be enough to protect their privacy.

However, an advertiser’s query could still learn information about both viewers and their households. A query could create a group that consists entirely of records from household 12345 or, even worse, a group that consisted entirely of records for viewer dave_sr. In both cases, the number of records in the group would meet the requirements set by ActonViz (minimum of 2 records per group).

Aggregation policies with entity-level privacy

To achieve entity-level privacy, Snowflake allows you to specify an entity key when assigning an aggregation policy to a table or view. After the entity key is defined, the groups returned by a query against an aggregation-constrained table or view must contain the specified number of entities, not just a specified number of rows.

In the preceding example, suppose ActonViz defines household_id as the entity key because it uniquely identifies each household. The privacy of each household is now preserved. Before the change, a group could consist entirely of records where household_id = 12345, but now it must contain at least two distinct values of household_id.

Note that the entity key is not always the same as the primary key of a table. In this example, the table might use user_id as the primary key because it uniquely identifies a viewer. But in this case, ActonViz wants to protect the privacy of an entire household, which consists of multiple viewers, so they chose household_id as the entity key.

About minimum group sizes

Every aggregation policy specifies a minimum group size. Without entity-level privacy, the minimum group size defines the number of records that must be included in an aggregation group. When an entity key is specified, the minimum group size defines how many entities must be included in an aggregation group.

The following column-level policies do not affect how Snowflake calculates whether there are enough entities in an aggregation group:

  • Projection policies have no effect.

  • Masking policies have no effect. When a masking policy is assigned to the GROUP BY column, the aggregation groups formed by the query are based on the values returned by the masking policy. Each of these groups must have enough entities.

In cases where name references are used several times (for example, in JOIN or UNION operators), Snowflake enforces the minimum group size for each name reference of each dataset separately. This applies even when the reference points to the same dataset several times.

Enforce entity-level privacy with aggregation policies

To enforce entity-level privacy with aggregation policies, do the following:

  1. When executing the CREATE AGGREGATION POLICY command to create the aggregation policy, specify the number of entities that must be included in each aggregation group.

  2. Define the entity key when assigning the aggregation policy to a table or view.

Specify the minimum number of entities

The syntax for creating an aggregation policy with CREATE AGGREGATION POLICY does not change if you are using an entity key to achieve entity-level privacy. You still use the MIN_GROUP_SIZE argument of the AGGREGATION_CONSTRAINT function to specify a minimum group size. As soon as you define an entity key, the minimum group size changes from a requirement on the number of records in a group to the number of entities in a group.

For example, the following code creates an aggregation policy that has a minimum group size of 5. As long as you define an entity key when assigning the policy to a table, each aggregation group must contain at least 5 entities.

CREATE AGGREGATION POLICY my_agg_policy
  AS () RETURNS AGGREGATION_CONSTRAINT ->
  AGGREGATION_CONSTRAINT(MIN_GROUP_SIZE => 5);
Copy

For complete details about creating aggregation policies, including an example of a conditional aggregation policy that enforces different restrictions under different circumstances, see Create an aggregation policy.

Define an entity key

You define an entity key for a table when you assign the aggregation policy to the table or view. You can define the entity key when creating a new table or view, or when updating an existing table of view.

Define an entity key for existing tables and views

When executing the ALTER TABLE … SET AGGREGATION POLICY command or the ALTER VIEW … SET AGGREGATION POLICY command to assign the aggregation policy, use the ENTITY KEY clause to specify which columns in the table or view contain the identifying attributes of an entity (that is, the entity key).

For example, to create an entity key while assigning an aggregation policy my_agg_policy to a table viewership_log, execute:

ALTER TABLE viewership_log
  SET AGGREGATION POLICY my_agg_policy
  ENTITY KEY (first_name,last_name);
Copy

Because columns first_name and last_name are the entity key, the aggregation policy can determine that all rows where first_name = joe and last_name = peterbilt belong to the same entity.

Specify an entity key for new tables and views

When executing the CREATE TABLE … WITH AGGREGATION POLICY command or the CREATE VIEW … WITH AGGREGATION POLICY command to assign the aggregation policy, use the ENTITY KEY clause to specify which columns in the table or view contain the identifying attributes of an entity.

For example, to create a new table t1 while assigning an aggregation policy and defining an entity key, execute:

CREATE TABLE t1
  WITH AGGREGATION POLICY my_agg_policy
  ENTITY KEY (first_name,last_name);
Copy

Because columns first_name and last_name are the entity key, the aggregation policy can determine that all rows where first_name = joe and last_name = peterbilt belong to the same entity.

Querying an aggregation-constrained table

The requirements for querying an aggregation-constrained table that has an entity key is the same as querying a table without one. For information about what types of queries conform to these requirements, see Query requirements.