Using the SQL Query template in the web app

Including a SQL Query template in a clean room lets consumers write free form SQL to query data in the clean room. When using the SQL Query template, consumer queries must meet certain requirements to successfully return results. These requirements are determined by how the data provider protects their tables with data privacy policies.

About policies

The following types of policies are important to understand within the context of a clean room that includes a SQL Query template, both for the provider who is protecting their data and for the consumer who is querying that data.

Policy

Description

Projection policy

Controls whether a query can project a column, that is, include the column in a SELECT statement. In a clean room, consumers cannot project a column with a projection policy.

Aggregation policy

Requires that a query aggregate data in order to return results. Every table accessed with a SQL Query template has an aggregation policy.

An aggregation policy protects the privacy of entities by requiring that each aggregation group contain a minimum number of entities. An entity is identified by a distinct value in a column. For example, all rows with the value joe@company.com in the email column might belong to the same entity.

In clean rooms, the aggregation policy can protect multiple entities in the table. For example, the policy might require that an aggregation group contain at least 5 households and 10 users, where households and users are both entities.

Important

Join policies, which are added to the clean room in step 2 of the creation process, do not control joins when using the SQL Query template. Rather, the clean room applies a projection policy and aggregation policy to each column selected as a join column. A SQL query can join on any column, not just those selected with the Join Columns drop-down list.

Add a SQL Query template to a clean room

The general process of configuring a clean room so consumers can use the SQL Query template to execute analyses includes the following:

  1. Start the clean room creation process.

  2. In step 2, use the Join Columns drop-down list to select columns that you want to protect with aggregation and projection policies. Your selections will have no effect on which columns can be used as join keys.

  3. In step 3, Horizontal » SQL Query from the list of templates, then configure the SQL Query template.

  4. Share the clean room with a collaborator.

Configure the SQL Query template

After you add the SQL Query template to the clean room, you can configure it by performing the following actions:

Allow consumer to use a join column in a SELECT statement

Use the Aggregation Policy Columns drop-down list to select the name of the column, which is labeled with Join Policy.

Allow consumer to SELECT and filter on a non-join column

Use the Fully Permitted Columns drop-down list to select the name of the column.

Define additional entities

Every column with a join policy identifies an entity protected by the table’s aggregation policy. This means that every aggregation group must contain a minimum number of distinct values from every join column.

You can designate additional columns as identifying entities so each aggregation group must contain a minimum number of distinct values from those columns. For example, suppose HEM is a join policy. The default configuration of the aggregation policy forces queries to aggregate data into groups that contain a certain amount of distinct values from the HEM column. Now suppose you add the FIRST_NAME column as an aggregation policy column. Now, each aggregation group must contain a certain number for distinct values from the HEM columns and a certain number of distinct values from the FIRST_NAME column.

To add a column that identifies an entity, use the Aggregation Policy Columns drop-down list to select the name of the column.

Note

Be aware that specifying that a column identifies an entity also allows the consumer to filter on and project that column.

Change the minimum number of entities

When a column identifies an entity, each aggregation group must contain a certain number of distinct values from the column. To adjust the number of distinct values of an entity:

  1. Find the Privacy Settings » Aggregation Policies section.

  2. Find the column, and then increase or decrease its Threshold. The higher the threshold, the more distinct values must be returned in each aggregation group, which provides stronger privacy protections for the entity.

Running an analysis using a SQL Query template

Use the information in this section to write a successful query in the SQL Query template.

What columns can I join on?

You can join on any column.

Aggregation requirement

Every table in a clean room that uses the SQL Query template has an aggregation policy. As a result, all queries executed using the SQL Query template must return aggregated results.

For guidelines on how to write a successful query against data protected by an aggregation policy, see:

What columns can I project?

Queries executed using the SQL Query template have restrictions on which columns can be projected, that is, used in a SELECT statement.

To determine if your query can project a column:

  1. In the Query Configurations section, find the Tables tile.

  2. Use the drop-down list to select a table.

  3. Look for columns that have a projection policy label. You can project all of the listed columns except the ones with a projection policy.

What columns can I filter on?

To determine which columns your query can filter on:

  1. In the Query Configurations section, find the Tables tile.

  2. Use the drop-down list to select a table and review the columns. You can filter on all of the listed columns.

Limitations

  • An ORDER BY clause has no effect on how the results of the analysis are displayed.

Sample queries

Use this section to better understand what a query can and cannot include when running an analysis with the SQL Query template.

Queries without an aggregation function

In some circumstances, you can return values without using an aggregation function.

Allowed

Not allowed

SELECT gender, regions
  FROM TABLE sample_db.demo.customer
  GROUP BY gender, region;
Copy
SELECT gender, regions
  FROM TABLE sample_db.demo.customer;
Copy
Common table expressions (CTEs)

Allowed

Not allowed

WITH audience AS
  (SELECT COUNT(DISTINCT t1.hashed_email),
    t1.status
    FROM provider_db.overlap.customers t1
    JOIN consumer_db.overlap.customers t2
      ON t1.hashed_email = t2.hashed_email
    GROUP BY t1.status);

SELECT * FROM audience;
Copy
WITH audience AS
  (SELECT t1.hashed_email,
    t1.status
    FROM provider_db.overlap.customers quoted t1
    JOIN consumer_db.overlap.customers t2
      ON t1.hashed_email = t2.hashed_email
    GROUP BY t1.status)

SELECT * FROM audience
Copy
CREATE, ALTER, TRUNCATE

A query cannot use CREATE, ALTER, or TRUNCATE.

Query with joins

Allowed

SELECT p.education_level,
  c.status,
  AVG(p.days_active),
  COUNT(DISTINCT p.age_band)
  FROM  sample_database_preprod.demo.customers c
  INNER JOIN
  sample_database_preprod.demo.customers p
    ON  c.hashed_email = p.hashed_email
  GROUP BY ALL;
Copy
DATE_TRUNC

Allowed

SELECT COUNT(*),
  DATE_TRUNC('week', date_joined) AS week
  FROM consumer_sample_database.audience_overlap.customers
  GROUP BY week;
Copy
Quoted identifiers

Allowed

SELECT COUNT(DISTINCT t1.”hashed_email”)
  FROM provider_sample_database.audience_overlap."customers quoted" t1
  INNER JOIN
  consumer_sample_database.audience_overlap.customers t2
    ON t1."hashed_email" = t2.hashed_email;
Copy