Using the SQL Query template in the web app

Including a SQL Query template in a clean room lets consumers write free form SQL to query data in the clean room. When using the SQL Query template, consumer queries must meet certain requirements to successfully return results. These requirements are determined by how the data provider protects their tables with data privacy policies.

Template policies

The following policies are applied by the SQL Query template to control how the consumer can query the data:

Projection policy

Columns with a projection policy cannot be projected (that is, included in a SELECT statement). However, consumers can filter or join on a column with a projection policy.

Aggregation policy

Requires that a query aggregate data in order to return results.

An aggregation policy protects the privacy of entities by requiring that each aggregation group contain a minimum number of entities. An entity is identified by a distinct value in a column. For example, all rows with the value joe@company.com in the email column might belong to the same entity.

In clean rooms, the aggregation policy can protect multiple entities in the table. For example, the policy might require that an aggregation group contain at least 5 households and 10 users, where households and users are both entities.

Be aware that assigning an aggregation policy to a column allows the consumer to filter on, project, and join on that column.

Important

Join policies specified during the clean room creation process are ignored when using the SQL Query template.

Add a SQL Query template to a clean room

The general process of configuring a clean room so consumers can use the SQL Query template to execute analyses includes the following:

  1. Start the clean room creation process.

  2. When prompted to select a template, select Horizontal » SQL Query and then configure the SQL Query template.

  3. Share the clean room with a collaborator.

Configure the SQL Query template

Consumer queries cannot select, filter, or join on a column in a SQL Query template unless you assign a policy to a column or fully permit the column.

After you add the SQL Query template to the clean room, you can configure it by performing the following actions:

Allow a consumer to SELECT, filter, or join on a column without restriction

Use the Fully Permitted Columns drop-down list to select the name of the column.

Assign an aggregation policy to one or more columns

To assign an aggregation policy to a column, use the Aggregation Policy Columns drop-down list to select the name of the column. After you assign the aggregation policy, the consumer’s queries must aggregate data into groups that contain a minimum number for distinct values from the column.

After you have assigned the aggregation policy to a column, you can adjust how many distinct values (that is, entities) from the column must be included in the aggregation groups returned by the consumer’s query. To adjust the required number of entities in each group:

  1. Find the Privacy Settings » Aggregation Policies section.

  2. Find the column, and then increase or decrease its Threshold. The higher the threshold, the more distinct values must be returned in each aggregation group, which provides stronger privacy protections for the entity.

You can assign the aggregation policy to additional columns to define additional entities, then adjust how many of those entities must be included in each group.

Assign projection policies to one or more columns

To assign a projection policy to a column, use the Projection Policy Columns drop-down list to select the name of the column. Consumers cannot use a column with a projection policy in a SELECT statement.

Running an analysis using a SQL Query template

Use the information in this section to write a successful query in the SQL Query template.

What columns can I join and filter on?

You can join and filter on any column that has a policy or is fully permitted. To determine if a column can be joined or used in a filter:

  1. In the Query Configurations section, find the Tables tile.

  2. Use the drop-down list to select a table. You can join and filter on all of the columns listed.

Aggregation requirement

If the provider assigned an aggregation policy to a column, all queries executed using the SQL Query template must return aggregated results.

To determine if your query must aggregate results:

  1. In the Query Configurations section, find the Tables tile.

  2. Use the drop-down list to select a table.

  3. Look for columns that have an aggregation policy label. If there is at least one aggregation policy label, you must use an aggregate in your query.

For guidelines on how to write a successful query against data protected by an aggregation policy, see:

What columns can I project?

Queries executed using the SQL Query template have restrictions on which columns can be projected (used in a SELECT statement).

To determine if your query can project a column:

  1. In the Query Configurations section, find the Tables tile.

  2. Use the drop-down list to select a table.

  3. Look for columns that have a projection policy label, which means you cannot project it. You can project all columns except the ones with the projection policy label.

Display the results

To run your query in the web app, select Run, then optionally name and schedule your query run. Check query status in the Analyses and Queries list to see when the query is finished. Select a completed query to see the results. Successful queries display a table of results. If the results table fulfills specific requirements detailed next, the results can also be displayed in a graph. Snowflake offers several graph types, including bar, line, and pie graphs; choose the output format by selecting the appropriate button in the results section.

Graphing requirements

In order for Snowflake to be able to generate a graph:

  • The results table must include at least one measure (numeric) column and one dimension (category) column.

  • The measure column name must have the following prefix or suffix (case-insensitive):

    • Column-name prefixes:

      • COUNT

      • SUM

      • AVG

      • MIN

      • MAX

      • OUTPUT

      • OVERLAP

    • Column-name suffix:

      • _OVERLAP

Snowflake generates a chart using the first eligible measure column and the first dimension column in a results table.

Limitations

  • An ORDER BY clause has no effect on how the results of the analysis are displayed.

Sample queries

Use this section to better understand what a query can and cannot include when running an analysis with the SQL Query template.

Queries without an aggregation function

In some circumstances, you can return values without using an aggregation function.

Allowed

Not allowed

SELECT gender, regions
  FROM TABLE sample_db.demo.customer
  GROUP BY gender, region;
Copy
SELECT gender, regions
  FROM TABLE sample_db.demo.customer;
Copy
Common table expressions (CTEs)

Allowed

Not allowed

WITH audience AS
  (SELECT COUNT(DISTINCT t1.hashed_email),
    t1.status
    FROM provider_db.overlap.customers t1
    JOIN consumer_db.overlap.customers t2
      ON t1.hashed_email = t2.hashed_email
    GROUP BY t1.status);

SELECT * FROM audience;
Copy
WITH audience AS
  (SELECT t1.hashed_email,
    t1.status
    FROM provider_db.overlap.customers quoted t1
    JOIN consumer_db.overlap.customers t2
      ON t1.hashed_email = t2.hashed_email
    GROUP BY t1.status)

SELECT * FROM audience
Copy
CREATE, ALTER, TRUNCATE

A query cannot use CREATE, ALTER, or TRUNCATE.

Query with joins

Allowed

SELECT p.education_level,
  c.status,
  AVG(p.days_active),
  COUNT(DISTINCT p.age_band)
  FROM  sample_database_preprod.demo.customers c
  INNER JOIN
  sample_database_preprod.demo.customers p
    ON  c.hashed_email = p.hashed_email
  GROUP BY ALL;
Copy
DATE_TRUNC

Allowed

SELECT COUNT(*),
  DATE_TRUNC('week', date_joined) AS week
  FROM consumer_sample_database.audience_overlap.customers
  GROUP BY week;
Copy
Quoted identifiers

Allowed

SELECT COUNT(DISTINCT t1.”hashed_email”)
  FROM provider_sample_database.audience_overlap."customers quoted" t1
  INNER JOIN
  consumer_sample_database.audience_overlap.customers t2
    ON t1."hashed_email" = t2.hashed_email;
Copy