Using the SQL Query template in the web app¶
Including a SQL Query template in a clean room lets consumers write free form SQL to query data in the clean room. When using the SQL Query template, consumer queries must meet certain requirements to successfully return results. These requirements are determined by how the data provider protects their tables with data privacy policies.
Template policies¶
The following policies are applied by the SQL Query template to control how the consumer can query the data:
- Projection policy
Columns with a projection policy cannot be projected (that is, included in a SELECT statement). However, consumers can filter or join on a column with a projection policy.
- Aggregation policy
Requires that a query aggregate data in order to return results.
An aggregation policy protects the privacy of entities by requiring that each aggregation group contain a minimum number of entities. An entity is identified by a distinct value in a column. For example, all rows with the value
joe@company.com
in theemail
column might belong to the same entity.In clean rooms, the aggregation policy can protect multiple entities in the table. For example, the policy might require that an aggregation group contain at least 5 households and 10 users, where households and users are both entities.
Be aware that assigning an aggregation policy to a column allows the consumer to filter on, project, and join on that column.
Important
Join policies specified during the clean room creation process are ignored when using the SQL Query template.
Add a SQL Query template to a clean room¶
The general process of configuring a clean room so consumers can use the SQL Query template to execute analyses includes the following:
Start the clean room creation process.
When prompted to select a template, select Horizontal » SQL Query and then configure the SQL Query template.
Share the clean room with a collaborator.
Configure the SQL Query template¶
Consumer queries cannot select, filter, or join on a column in a SQL Query template unless you assign a policy to a column or fully permit the column.
After you add the SQL Query template to the clean room, you can configure it by performing the following actions:
- Allow a consumer to SELECT, filter, or join on a column without restriction
Use the Fully Permitted Columns drop-down list to select the name of the column.
- Assign an aggregation policy to one or more columns
To assign an aggregation policy to a column, use the Aggregation Policy Columns drop-down list to select the name of the column. After you assign the aggregation policy, the consumer’s queries must aggregate data into groups that contain a minimum number for distinct values from the column.
After you have assigned the aggregation policy to a column, you can adjust how many distinct values (that is, entities) from the column must be included in the aggregation groups returned by the consumer’s query. To adjust the required number of entities in each group:
Find the Privacy Settings » Aggregation Policies section.
Find the column, and then increase or decrease its Threshold. The higher the threshold, the more distinct values must be returned in each aggregation group, which provides stronger privacy protections for the entity.
You can assign the aggregation policy to additional columns to define additional entities, then adjust how many of those entities must be included in each group.
- Assign projection policies to one or more columns
To assign a projection policy to a column, use the Projection Policy Columns drop-down list to select the name of the column. Consumers cannot use a column with a projection policy in a SELECT statement.
Running an analysis using a SQL Query template¶
Use the information in this section to write a successful query in the SQL Query template.
What columns can I join and filter on?¶
You can join and filter on any column that has a policy or is fully permitted. To determine if a column can be joined or used in a filter:
In the Query Configurations section, find the Tables tile.
Use the drop-down list to select a table. You can join and filter on all of the columns listed.
Aggregation requirement¶
If the provider assigned an aggregation policy to a column, all queries executed using the SQL Query template must return aggregated results.
To determine if your query must aggregate results:
In the Query Configurations section, find the Tables tile.
Use the drop-down list to select a table.
Look for columns that have an aggregation policy label. If there is at least one aggregation policy label, you must use an aggregate in your query.
For guidelines on how to write a successful query against data protected by an aggregation policy, see:
Query requirements for aggregation policies. For example, you can use this section to determine that the MIN and MAX aggregation functions do not satisfy the query requirements, and cannot be used.
What columns can I project?¶
Queries executed using the SQL Query template have restrictions on which columns can be projected (used in a SELECT statement).
To determine if your query can project a column:
In the Query Configurations section, find the Tables tile.
Use the drop-down list to select a table.
Look for columns that have a projection policy label, which means you cannot project it. You can project all columns except the ones with the projection policy label.
Display the results¶
To run your query in the web app, select Run, then optionally name and schedule your query run. Check query status in the Analyses and Queries list to see when the query is finished. Select a completed query to see the results. Successful queries display a table of results. If the results table fulfills specific requirements detailed next, the results can also be displayed in a graph. Snowflake offers several graph types, including bar, line, and pie graphs; choose the output format by selecting the appropriate button in the results section.
Graphing requirements¶
In order for Snowflake to be able to generate a graph:
The results table must include at least one measure (numeric) column and one dimension (category) column.
The measure column name must have the following prefix or suffix (case-insensitive):
Column-name prefixes:
COUNT
SUM
AVG
MIN
MAX
OUTPUT
OVERLAP
Column-name suffix:
_OVERLAP
Snowflake generates a chart using the first eligible measure column and the first dimension column in a results table.
Limitations¶
An ORDER BY clause has no effect on how the results of the analysis are displayed.
Sample queries¶
Use this section to better understand what a query can and cannot include when running an analysis with the SQL Query template.
- Queries without an aggregation function
In some circumstances, you can return values without using an aggregation function.
Allowed
Not allowed
SELECT gender, regions FROM TABLE sample_db.demo.customer GROUP BY gender, region;
SELECT gender, regions FROM TABLE sample_db.demo.customer;
- Common table expressions (CTEs)
Allowed
Not allowed
WITH audience AS (SELECT COUNT(DISTINCT t1.hashed_email), t1.status FROM provider_db.overlap.customers t1 JOIN consumer_db.overlap.customers t2 ON t1.hashed_email = t2.hashed_email GROUP BY t1.status); SELECT * FROM audience;
WITH audience AS (SELECT t1.hashed_email, t1.status FROM provider_db.overlap.customers quoted t1 JOIN consumer_db.overlap.customers t2 ON t1.hashed_email = t2.hashed_email GROUP BY t1.status) SELECT * FROM audience
- CREATE, ALTER, TRUNCATE
A query cannot use CREATE, ALTER, or TRUNCATE.
- Query with joins
Allowed
SELECT p.education_level, c.status, AVG(p.days_active), COUNT(DISTINCT p.age_band) FROM sample_database_preprod.demo.customers c INNER JOIN sample_database_preprod.demo.customers p ON c.hashed_email = p.hashed_email GROUP BY ALL;
- DATE_TRUNC
Allowed
SELECT COUNT(*), DATE_TRUNC('week', date_joined) AS week FROM consumer_sample_database.audience_overlap.customers GROUP BY week;
- Quoted identifiers
Allowed
SELECT COUNT(DISTINCT t1.”hashed_email”) FROM provider_sample_database.audience_overlap."customers quoted" t1 INNER JOIN consumer_sample_database.audience_overlap.customers t2 ON t1."hashed_email" = t2.hashed_email;