Design custom templates¶
About clean room templates¶
Clean room templates are written in JinjaSQL. JinjaSQL is an extension to the Jinja templating language. A JinjaSQL template evaluates to a SQL statement when run in a clean room. The JinjaSQL templating language provides logic statements and run-time variable replacement, which enables the template to be customized at run time. For example, a user can provide table and column names when they run the template, and the template can adjust itself based on the values passed in.
There are two general types of templates:
Analysis templates, which evaluate to a SQL DQL statement (a SELECT statement) that returns query results immediately to the template runner.
Activation templates, which are used to activate results to a Snowflake account, rather than showing results in the immediate environment. An activation template is very similar to an analysis template with a few extra requirements, and it evaluates to a DDL statement (CREATE TABLE).
Creating, sharing, and running a custom template¶
Any collaborator can register and share templates with specific analysis runners in a collaboration.
Let’s start by looking at a simple SQL query, and how it would be written as a template.
1. The JinjaSQL template¶
Here is a simple SQL query that joins two tables by email and shows the overlap count per city:
Here is how that query would look as a JinjaSQL template that allows the caller to choose the JOIN and GROUP BY columns, as well as the tables used. The template includes some filters that enforce Snowflake Data Clean Room policies.
Notes on the template:
Values within {{ double bracket pairs }} are variables. The values are populated by the caller.
group_by_col,source_table,p1_join_col, andp2_join_colare all variables populated by the caller. These variables have arbitrary names chosen by the template designer.source_tableis a standard Snowflake-defined variable. This variable defines the views to use in the query. These views are datasets within data offerings that are linked into the clean room. Collaborators can list available datasets by calling VIEW_DATA_OFFERINGS.A dataset must be aliased as lowercase
pif you want to enforce Snowflake Data Clean Room policies on it. If a template uses multiple datasets, the first isporp1, and additional datasets are indexed asp2,p3, and so on.IDENTIFIER is needed for all column and table names, because variables in {{ double brackets }} evaluate to string literals, which aren’t valid identifiers.
JinjaSQL filters are applied to columns to enforce Snowflake Data Clean Room policies on the column. Snowflake implements custom filters
join_policyandcolumn_policy, which verify whether a column complies with join or column policies in the clean room respectively, and fail the query if it doesn’t. A filter is applied to a column name as{{ column_name | filter_name }}.
All these points will be discussed in detail later.
2. The Collaboration template¶
A template is added to a collaboration by embedding it in a YAML specification and registering it, then linking it.
You must request to share a template with a given analysis runner, who can accept or reject the request. Additionally, all data providers for that analysis runner must accept the request for the template to be shared.
3. Running the template¶
Here is how an analysis runner might run this template in code. Note how column names are qualified by the table aliases declared in the template.
Developing a custom template¶
Clean room templates are JinjaSQL templates. To create a template, you should be familiar with the following topics:
You can use Cortex Code to validate the SQL output of your JinjaSQL templates based on variable inputs that should be provided. See example prompts below that you can copy into Cortex Code to get final SQL outputs you can test:
Example:
The rendered template looks like this:
Try running the SQL statement above in your environment to see if it works and gets the expected results.
Then test your template without a WHERE clause:
Rendered template:
Add the template into your clean room, and test with an analysis run spec.
Data protection¶
Templates can access only datasets linked into the clean room by collaborators.
Collaborators specify join, column, and activation policies on their datasets to ensure that only those columns can be used as an input for a template variable.
Important
The template must include the appropriate JinjaSQL policy filter on a column for the policy to be applied.
Custom template syntax¶
Snowflake Data Clean Rooms supports V3 JinjaSQL, with a few extensions as noted.
This section includes the following topics:
Template naming rules¶
When creating a template, names must contain only letters, numbers, or underscores.
Template names are assigned in the template specification’s name field when you register the template.
Example valid names:
my_templateactivation_template_1
Example invalid names:
my template- Spaces not allowedmy_template!- Special characters not allowed
Template variables¶
Template callers can pass in values to template variables. JinjaSQL syntax enables variable binding for any variable name within {{ double_brackets }}, but Snowflake reserves a few variable names that you shouldn’t override, as described below.
Caution
All variables, whether Snowflake-defined or custom, are populated by the user and should be treated with appropriate caution. Analysis templates must resolve to a single SELECT statement (activation templates resolve to a script block). Remember that all variables are passed in by the caller.
Snowflake-defined variables¶
All clean room templates have access to the following global variables defined by Snowflake, but passed in by the analysis runner:
source_table:A zero-based string array of tables and views from data offerings linked into the collaboration via LINK_DATA_OFFERING that can be used by the template.
Example:
SELECT col1 FROM IDENTIFIER({{ source_table[0] }}) AS p;my_table:In a Collaboration clean room,
my_tableis used only by Snowflake Standard Edition users. For these users,my_tableis a zero-based string array of datasets that the analysis runner linked by calling LINK_LOCAL_DATA_OFFERING.Example:
SELECT col1 FROM IDENTIFIER({{ my_table[0] }}) AS c;
Custom variables¶
Template creators can include arbitrary variables in a template that can be populated by the analysis runner. These variables can have any Jinja-compliant name except for the Snowflake-defined variables or table alias names. You should provide guidance in the parameter section of the template for required and optional variables.
Custom variables can be accessed by your template, as shown here for the custom variable max_income:
Analysis runners pass variables when calling RUN as defined in the analysis run spec.
Resolving variables correctly¶
String values passed into the template resolve to a string literal in the final template. This can cause SQL parsing or logical errors if you don’t handle bound variables appropriately:
SELECT {{ my_col }} FROM p;- This resolves toSELECT 'my_col' from p;which simply returns the string “my_col” - probably not what you want.SELECT age FROM {{ source_table[0] }} AS p;- This resolves toSELECT age FROM 'somedb.somesch.source_table' AS p;, which causes a parsing error because a table must be an identifier, not a literal string.SELECT age FROM IDENTIFIER({{ source_table[0] }}) AS p {{ where_clause }};- Passing in “WHERE age < 50” evaluates toSELECT age FROM mytable AS p 'WHERE age < 50';, which is a parsing error because of the literal string WHERE clause.
Therefore, where appropriate, you must resolve variables. Here is how to resolve variables properly in your template:
- Resolving table and column names
Variables that specify table or column names must be converted to identifiers in your template in one of two ways:
IDENTIFIER: For example:
SELECT IDENTIFIER({{ my_column }}) FROM p;sqlsafe: This JinjaSQL filter resolves identifier strings to SQL text. An equivalent statement to the previous bullet is
SELECT {{ my_column | sqlsafe }} FROM p;
Your particular usage dictates when to use IDENTIFIER or
sqlsafe. For example,p.{{ my_column | sqlsafe }}can’t easily be rewritten using IDENTIFIER.- Resolving dynamic SQL
When you have a string variable that should be used as literal SQL, such as a WHERE clause, use the
sqlsafefilter in your template. For example:If a user passes in “age < 50” to
where_clause, the query would resolve toSELECT age FROM sometable AS p WHERE 'age < 50';which is invalid SQL because of the literal string WHERE condition. In this case, you should use thesqlsafefilter:
Required table aliases¶
At the top level of your query, all source_table datasets must be aliased as p, and all my_table datasets must be aliased as
c, in order for Snowflake to validate join and column policies correctly in the query. Any column that must be verified against join
or column policies must be qualified with the lowercase p or c table alias.
If you use multiple source_table or my_table datasets in your query, add a numeric, sequential 1-based suffix to each table alias
after the first. So: p or p1, p2, p3, and so on for the first, second, and third source_table datasets, and c or c1, c2,
c3, and so on for the first, second, and third my_table datasets. The p or c index should be sequential without gaps (that
is, create the aliases p1, p2, and p3, not p1, p2, and p4).
Example
Custom clean room template filters¶
Snowflake supports all the standard Jinja filters and most of the standard JinjaSQL filters, along with a few extensions:
join_policy:Succeeds if the column is in the join policy of the data owner; fails otherwise. See Applying data protection policies to data offerings.
column_policy:Succeeds if the column is in the column policy of the data owner; fails otherwise. See Applying data protection policies to data offerings.
activation_policy:Succeeds if the column is in the activation policy of the data owner; fails otherwise. See Applying data protection policies to data offerings.
join_and_column_policy:Succeeds if the column is in the join or column policy of the data owner; fails otherwise. See Applying data protection policies to data offerings.
identifier:This JinjaSQL filter is not supported by Snowflake templates.
Tip
JinjaSQL statements are evaluated left to right:
{{ my_col | column_policy }}Correct{{ my_col | sqlsafe | column_policy }}Correct{{ column_policy | my_col }}Incorrect{{ my_col | column_policy | sqlsafe }}Incorrect:column_policywill be checked against themy_colvalue as a string, which is an error.
Enforcing clean room policies¶
Clean rooms don’t automatically check clean room policies against columns used in a template. If you want to enforce a policy against a column:
You must apply the appropriate policy filter to that column in the template. For example:
You must alias the table as lowercase
porc. See Required table aliases.
Policies are checked only against columns of tables referenced in a source_table variable, which refer to views shared within the clean room. Policies are not checked against columns of tables referenced in a my_table variable, which are local tables not shared within the clean room.
Note that column names can’t be ambiguous when testing policies. So if you have columns with the same name in two tables, you must qualify the column name in order to test the policy against that column.
Access considerations and best practices¶
A template is always executed in context to the clean room application role. A collaborator does not have direct access to any data within the clean room that is restricted to template access only; all access is through the native application roles and the template outputs.
As best practice, you should follow the below for templates you create or use in a clean room:
Ensure a policy filter is applied any time a column variable is used in a template, so that collaborator policies are respected.
Wrap user-provided variables with IDENTIFIER() when possible to strengthen templates against SQL injection attacks.
Activation templates¶
A template can also be used to save query results to a table outside of the clean room; this is called activation. An activation template is an analysis template with the following additional requirements:
Activation templates are JinjaSQL statements that evaluate to a SQL script block, unlike analysis templates, which can be simple SELECT statements.
Activation templates must create an internal table in the clean room to store results. The table generated by the template must have the prefix
cleanroom.activation_data_, for example:cleanroom.activation_data_my_resultsAll columns in the internal results table should have the value
activation_allowed: TRUEin their data offering specification.The script block should end with a RETURN statement that returns the name of the generated table without the
cleanroom.activation_data_prefix, for example:RETURN 'my_results'.The template itself has no naming requirements.
Here is an example activation template specification:
Learn how to implement activation in a collaboration: Activating query results.
Next steps¶
After you’ve mastered the templating system, read the specifics for implementing a clean room with your template type:
Activation templates create a results table after a successful run and is shared outside of the clean room. Depending on the collaboration specification, the results table can be shared to the analysis runner or other collaborators.
Code bundles are used to upload custom Python UDFs and UDTFs into a collaboration. Templates in the collaboration can run these functions to perform complex data actions.
Internal tables are used to store intermediary or persistent results, which can be used downstream to support multistep workflows. These tables are accessible to templates or custom uploaded code inside the clean room.