Creating custom clean room templates¶
About clean room templates¶
Clean room templates are written in JinjaSQL. JinjaSQL is an extension to the Jinja templating language that generates a SQL query as output. JinjaSQL supports logic statements and run-time variable resolution to let the user customize the query at run time. Variables are typically used in a template to allow a user to specify table names, table columns, and custom values to use in their query.
Snowflake provides a selection of pre-designed templates for common use cases. These stock templates can be used only in the web application. However, both providers and consumers can create custom templates for a clean room. Custom templates can be created only in code, but can be run either in code or through the web app.
Creating and running a custom template¶
In a clean room with default settings, the provider adds a template to a clean room and the consumer can choose, configure, and run it:
The provider designs a custom template and adds it to a clean room by calling
provider.add_custom_sql_template
.The consumer calls
consumer.run_analysis
to run the provider’s template, passing in values for any variables needed by the template.
This flow does not require permissions from the other party, other than that the consumer must be invited to a clean room by the provider. There are variations to this process, such as consumer-provided templates and provider-run templates, which are covered elsewhere.
Tip
When developing a custom template, you will probably modify the template and re-run the same query. To avoid receiving cached responses
against your old template, set the consumer.run_analysis
argument use_cache
argument to FALSE.
Data protection¶
Templates can access only datasets linked into the clean room by the provider and consumer.
Both the provider and consumer have the ability to set join, column, and activation policies on their data to protect which columns can be joined on, projected, or projected in activated results.
A quick example¶
Here is a simple SQL example that joins a provider and a consumer table by email and shows the overlap count per city:
SELECT COUNT(*), city FROM consumer_table
INNER JOIN provider_table
ON consumer_table.hashed_email = provider_table.hashed_email
GROUP BY city;
Here is how that query would look as a template that allows the caller to choose the select/group and join columns and the tables:
SELECT COUNT(*), IDENTIFIER({{ group_by_col | column_policy }})
FROM IDENTIFIER({{ my_table[0] }}) AS C
INNER JOIN IDENTIFIER({{ source_table[0] }}) AS P
ON IDENTIFIER({{ consumer_join_col | join_policy }}) = IDENTIFIER({{ provider_join_col | join_policy }})
GROUP BY IDENTIFIER({{ group_by_col | column_policy }});
Notes on the template:
Values within {{ double bracket pairs }} are custom variables.
group_by_col
,my_table
,source_table
,consumer_join_col
,provider_join_col
, andgroup_by_col
are all custom variables populated by the caller.source_table
andmy_table
are Snowflake-defined string array variables populated by the caller. Array members are fully-qualified names of provider and consumer tables linked into the clean room. The caller specifies which tables should be included in each array.Provider tables must be aliased as
P
and consumer tables asC
in a template. If you have multiple tables, you can index them asP1
,P2
,C1
,C2
, and so on.IDENTIFIER is needed for all column and table names, because variables in {{ double brackets }} evaluate to string literals, which aren’t valid identifiers.
JinjaSQL filters can be applied to variables. Snowflake implements the custom filters
join_policy
andcolumn_policy
, which verify whether a column complies with join or column policies in the clean room respectively, and fail the query if it does not. A filter is applied to a column name as{{ column_name | filter_name }}
.
All these points will be discussed in detail later.
Here is how a consumer might run this template in code. Note how column names are qualified by the table aliases declared in the template.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.CONSUMER.RUN_ANALYSIS(
$cleanroom_name,
$template_name,
['my_db.my_sch.consumer_table], -- Populates the my_table variable
['my_db.my_sch.provider_table'], -- Populates the source_table variable
OBJECT_CONSTRUCT( -- Populates custom named variables
'consumer_join_col','c.age_band',
'provider_join_col','p.age_band',
'group_by_col','p.device_type'
)
);
In order to use this template in the web application, the provider must create a custom UI form for the template. The UI form has named form elements that correspond to template variable names, and the values provided in the form are passed into the template.
Custom template syntax¶
Snowflake Data Clean Rooms supports V3 JinjaSQL, with a few extensions as noted.
This section includes the following topics:
Template variables¶
Template callers can pass in values to template variables. JinjaSQL syntax enables variable binding for any variable name within {{ double_brackets }}, but Snowflake reserves a few variable names that you should not override, as described below.
Caution
All variables, whether Snowflake-defined or custom, are populated by the user and should be treated with appropriate caution. Snowflake Data Clean Rooms templates must resolve to a single SELECT statement, but you should still remember that all variables are passed in by the caller.
Snowflake-defined variables¶
All clean room templates have access to the following global variables defined by Snowflake, but passed in by the caller:
source_table
: A zero-based string array of provider-linked tables and views in the clean room that can be used by the template. Table names are fully qualified, for example:my_db.my_sch.provider_customers
Example:
SELECT col1 FROM IDENTIFIER({{ source_table[0] }}) AS P;
my_table
: A zero-based string array of consumer tables and views in the clean room that can be used by the template. Table names are fully qualified, for example:my_db.my_sch.consumer_customers
Example:
SELECT col1 FROM IDENTIFIER({{ my_table[0] }}) AS C;
Note
There are two legacy clean room global variables: measure_columns
and dimensions
. They are no longer recommended for use, but are
still defined and appear in some legacy templates and documentation, so you should not alias tables or columns using either of these
names to avoid naming collisions.
Custom variables¶
Template creators can include arbitrary variables in a template that can be populated by the caller. These variables can have any arbitrary Jinja-compliant name except for the Snowflake-defined variables or table alias names. If you want your template to be usable in the web application, you must also provide a UI form for web app users. For code users, you should provide good documentation for the required and optional variables.
Custom variables can be accessed by your template, as shown here for the custom variable max_income
:
SELECT income FROM my_db.my_sch.customers WHERE income < {{ max_income }};
Users can pass variables to a template in two different ways:
In the web app, by selecting or providing values through a UI form created by the template developer. This UI form contains form elements where the user can provide values for your template. The name of the form element is the name of the variable. The template simply uses the name of the form element to access the value. Create the UI form using provider.add_ui_form_customizations.
In code, a consumer calls consumer.run_analysis and passes in table names as argument arrays, and custom variables as name-value pairs into the
analysis_arguments
argument.
Note
If you need to access user-provided values in any custom Python code uploaded to the clean room, you must
explicitly pass variable values in to the code through Python function arguments; template
variables are not directly accessible within the python code using {{jinja variable binding syntax}}
.
Resolving variables correctly¶
String values passed into the template resolve to a string literal in the final template. This can cause SQL parsing or logical errors if you don’t handle bound variables appropriately:
SELECT {{ my_col }} FROM P;
resolves toSELECT 'my_col' from P;
which simply returns the string “my_col” - probably not what you want.SELECT age FROM {{ my_table[0] }} AS P;
resolves toSELECT age FROM 'somedb.somesch.my_table' AS P;
, which causes a parsing error because a table must be an identifier, not a literal string.SELECT age FROM IDENTIFIER({{ my_table[0] }}) AS P {{ where_clause }};
passing in “WHERE age < 50” evaluates toSELECT age FROM mytable AS P 'WHERE age < 50';
, which is a parsing error because of the literal string WHERE clause.
Therefore, where appropriate, you must resolve variables. Here is how to resolve variables properly in your template:
- Table and column names
Variables that specify table or column names must be converted to identifiers in your template in one of two ways:
IDENTIFIER: For example:
SELECT IDENTIFIER({{ my_column }}) FROM P;
sqlsafe: This JinjaSQL filter resolves identifier strings to SQL text. An equivalent statement to the previous bullet is
SELECT {{ my_column | sqlsafe }} FROM P;
Your particular usage dictates when to use IDENTIFIER or
sqlsafe
. For example,c.{{ my_column | sqlsafe }}
can’t easily be rewritten using IDENTIFIER.- Dynamic SQL
When you have a string variable that should be used as literal SQL, such as a WHERE clause, use the
sqlsafe
filter in your template. For example:SELECT age FROM IDENTIFIER({{ my_table[0] }}) AS C WHERE {{ where_clause }};
If a user passes in “age < 50” to
where_clause
, the query would resolve toSELECT age FROM sometable AS C WHERE 'age < 50';
which is invalid SQL because of the literal string WHERE condition. In this case you should use thesqlsafe
filter:SELECT age FROM IDENTIFIER( {{ my_table[0] }} ) as C {{ where_clause | sqlsafe }};
Required table aliases¶
At the top level of your query, all tables or subqueries must be aliased as either P
(for provider-tables) or C
(for consumer
tables) in order for Snowflake to validate join and column policies correctly in the query. Any column that must be verified against join
or column policies belong to a table that is aliased as either P
or C
. (Specifying P
or C
tells the back end
whether to validate a column against the provider or the consumer policy respectively.)
If you use multiple provider or consumer tables in your query, add a numeric, sequential 1-based suffix to each table alias after the
first. So: P
, P1
, P2
, and so on for the first, second, and third provider tables, and C
, C1
, C2
, and so on for the
first, second, and third consumer tables. The P
or C
index should be sequential without gaps (that is, create the aliases P
,
P1
, and P2
, not P
, P2
, and P4
).
Example
SELECT col1 FROM IDENTIFIER({{ source_table[0] }}) AS P;
Template filters¶
Snowflake supports most of the standard JinjaSQL filters, along with a few custom filters:
join_policy
: Verifies whether the column is allowed by the table’s join policy, and fails if it is not.column_policy
: Verifies whether the column is allowed by the template’s column policy (is allowed to be projected).activation_policy
: Verifies whether the filtered column is allowed by the clean room’s activation policies (provider.set_activation_policy
orconsumer.set_activation_policy
).join_and_column_policy
: Verifies whether the column is permitted by the join, activation, or column policies. Used to provide more flexibility in the clean room, to allow the collaborators to update join and column policies without changing the template.sql_safe
: Standard JinjaSQL filter. Use to convert string literals into SQL statements or identifiers as needed.bind
: Standard JinjaSQL filter.inclause
: Standard JinjaSQL filter.The
`identifier`
JinjaSQL filter is not supported by Snowflake templates.
JinjaSQL filters modify the content that precedes it:
{{ my_col | column_policy }}
Correct{{ column_policy | my_col }}
Incorrect
Running custom Python code¶
Templates can run Python code uploaded to the clean room. The template can call a Python function that accepts values from a row of data and returns values to use or project in the query.
When a provider uploads custom Python code into a clean room, the template calls Python functions with the syntax
cleanroom.function_name
. More details here.When a consumer uploads custom Python code into a clean room, the template calls the function with the bare
function_name
value passed toconsumer.generate_python_request_template
(not scoped tocleanroom
as provider code is). More details here.
Provider code example:
-- Provider uploads a Python function that takes two numbers and returns the sum.
call samooha_by_snowflake_local_db.provider.load_python_into_cleanroom(
$cleanroom_name,
'simple_addition', -- Function name to use in the template
['someval integer', 'added_val integer'], -- Arguments
[], -- No packages needed
'integer', -- Return type
'main', -- Handler for function name
$$
def main(input, added_val):
return input + int(added_val)
$$
);
-- Template passes value from each row to the function, along with a
-- caller-supplied argument named 'increment'
call samooha_by_snowflake_local_db.provider.add_custom_sql_template(
$cleanroom_name,
'simple_python_example',
$$
SELECT val, cleanroom.simple_addition(val, {{ increment | sqlsafe }})
FROM VALUES (5),(8),(12),(39) AS P(val);
$$
);
Security considerations¶
A template must evaluate to a single SELECT query, which is executed by the clean room native application. The template is not executed with the identity of the current user.
The user does not have direct access to any data within the clean room; all access is through the native application via the template results.
Apply a policy filter any time a column is used in your query, even when you define a column name explicitly in the template, or when
the column or table is provided by you. You might change your join or column policies later, or change the column, and forget to update the
template. For any columns provided by the user, you should apply a join_policy
, column_policy
, join_and_column_policy
, or
activation_policy
filter.
Next steps¶
After you’ve mastered the templating system, read the specifics for implementing a clean room with your template type:
Provider templates are templates written by the provider. This is the default use case.
Consumer templates are templates written by the consumer. In some cases, a clean room creator wants to enable the consumer to create, upload, and run their own templates to the clean room.
Activation templates create a results table after a successful run. Depending on the activation template, the results table can either be saved to the provider or consumer’s account outside the clean room, or sent to a third-party activation provider listed in the Activation Hub.
Chained templates allow you to chain together multiple templates where the output of each template is used by the next template in the chain.