Custom clean room template reference¶

About clean room templates¶

Clean room templates are written in JinjaSQL. JinjaSQL is an extension to the Jinja templating language that generates a SQL query as output. JinjaSQL supports logic statements and run-time variable resolution to let the user customize the query at run time. Variables are typically used in a template to allow a user to specify table names, table columns, and custom values to use in their query.

Snowflake provides a selection of pre-designed templates for common use cases. These stock templates can be used only in the clean rooms UI. However, both providers and consumers can create custom templates for a clean room. Custom templates can be created only in code, but can be run either in code or through the clean rooms UI.

There are two general types of templates:

Analysis templates, which evaluate to a SELECT statement (or a set of SELECT operations).
Activation templates, which evaluate to a SELECT statement nested inside a CREATE TABLE statement, and return the table name. This template generates data that is exported to the consumer or provider’s Snowflake account or a third party, depending on how the clean room is configured. An activation template is very similar to an analysis template with a few extra requirements.

In the clean rooms UI, an analysis template can be associated with an activation template to enable the caller to run an analysis and then send data to themselves or a third party. The activation template does not need to resolve to the same query as the associated analysis template.

Creating and running a custom template¶

In a clean room with default settings, the provider adds a template to a clean room and the consumer can choose, configure, and run it:

The provider designs a custom template and adds it to a clean room by calling provider.add_custom_sql_template.
The consumer calls consumer.run_analysis to run the provider’s template, passing in values for any variables needed by the template.

This flow does not require permissions from the other party, other than that the consumer must be invited to a clean room by the provider. There are variations to this process, such as consumer-provided templates and provider-run templates, which are covered elsewhere.

A quick example¶

Here is a simple SQL example that joins a provider and a consumer table by email and shows the overlap count per city:

SELECT COUNT(*), city FROM consumer_table
  INNER JOIN provider_table
  ON consumer_table.hashed_email = provider_table.hashed_email
  GROUP BY city;

Copy

Here is how that query would look as a template that allows the caller to choose the select/group and join columns and the tables:

SELECT COUNT(*), IDENTIFIER({{ group_by_col | column_policy }})
  FROM IDENTIFIER({{ my_table[0] }}) AS C
  INNER JOIN IDENTIFIER({{ source_table[0] }}) AS P
  ON IDENTIFIER({{ consumer_join_col | join_policy }}) = IDENTIFIER({{ provider_join_col | join_policy }})
  GROUP BY IDENTIFIER({{ group_by_col | column_policy }});

Copy

Notes on the template:

Values within {{ double bracket pairs }} are custom variables. group_by_col, my_table, source_table, consumer_join_col, provider_join_col, and group_by_col are all custom variables populated by the caller.
source_table and my_table are Snowflake-defined string array variables populated by the caller. Array members are fully-qualified names of provider and consumer tables linked into the clean room. The caller specifies which tables should be included in each array.
Provider tables must be aliased as P and consumer tables as C in a template. If you have multiple tables, you can index them as P1, P2, C1, C2, and so on.
IDENTIFIER is needed for all column and table names, because variables in {{ double brackets }} evaluate to string literals, which aren’t valid identifiers.
JinjaSQL filters can be applied to variables. Snowflake implements the custom filters join_policy and column_policy, which verify whether a column complies with join or column policies in the clean room respectively, and fail the query if it does not. A filter is applied to a column name as {{ column_name | filter_name }}.

All these points will be discussed in detail later.

Here is how a consumer might run this template in code. Note how column names are qualified by the table aliases declared in the template.

CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.CONSUMER.RUN_ANALYSIS(
  $cleanroom_name,
  $template_name,
  ['my_db.my_sch.consumer_table],       -- Populates the my_table variable
  ['my_db.my_sch.provider_table'],      -- Populates the source_table variable
  OBJECT_CONSTRUCT(                     -- Populates custom named variables
    'consumer_join_col','c.age_band',
    'provider_join_col','p.age_band',
    'group_by_col','p.device_type'
  )
);

Copy

In order to use this template in the clean rooms UI, the provider must create a custom UI form for the template. The UI form has named form elements that correspond to template variable names, and the values provided in the form are passed into the template.

Developing a custom template¶

Clean room templates are JinjaSQL templates. To create a template, you should be familiar with the following topics:

Jinja templating basics
The JinjaSQL extension to Jinja.

Use the consumer.get_sql_jinja procedure to test the validity of your template and render a template, and run the rendered template to see that it produces the results that you want. Note that this procedure doesn’t support clean room filter extensions, such as join_policy, so you must test your template without those filters, and add them later.

Example:

-- Template to test
SELECT {{ col1 | sqlsafe }}, {{ col2 | sqlsafe }}
  FROM IDENTIFIER({{ source_table[0] }}) AS p
  JOIN IDENTIFIER({{ my_table[0] }}) AS c
  ON {{ provider_join_col | sqlsafe }} = {{ consumer_join_col | sqlsafe}}
  {% if where_phrase %} WHERE {{ where_phrase | sqlsafe}}{% endif %};

-- Render the template.
USE WAREHOUSE app_wh;
USE ROLE samooha_app_role;

CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.CONSUMER.GET_SQL_JINJA(
$$
SELECT {{ col1 | sqlsafe }}, {{ col2 | sqlsafe }}
  FROM IDENTIFIER({{ source_table[0] }}) AS p
  JOIN IDENTIFIER({{ my_table[0] }}) AS c
  ON {{ provider_join_col | sqlsafe }} = {{ consumer_join_col | sqlsafe}}
  {% if where_phrase %} WHERE {{ where_phrase | sqlsafe }}{% endif %};
  $$,
  object_construct(
'col1', 'c.status',
'col2', 'c.age_band',
'where_phrase', 'p.household_size > 2',
'consumer_join_col', 'c.age_band',
'provider_join_col', 'p.age_band',
'source_table', ['SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS'],
'my_table', ['SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS']
));

-- The rendered template looks like this:
SELECT c.status, c.age_band
  FROM IDENTIFIER('SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS') AS p
  JOIN IDENTIFIER('SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS') AS c
  ON p.age_band = c.age_band
  WHERE p.household_size > 2;

-- Run it.

- Test without a WHERE clause
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.CONSUMER.GET_SQL_JINJA(
$$
SELECT {{ col1 | sqlsafe }}, {{ col2 | sqlsafe }}
  FROM IDENTIFIER({{ source_table[0] }}) AS p
  JOIN IDENTIFIER({{ my_table[0] }}) AS c
  ON {{ provider_join_col | sqlsafe }} = {{ consumer_join_col | sqlsafe}}
  {% if where_phrase %} WHERE {{ where_phrase | sqlsafe }}{% endif %};
  $$,
  object_construct(
'col1', 'c.status',
'col2', 'c.age_band',
'consumer_join_col', 'c.age_band',
'provider_join_col', 'p.age_band',
'source_table', ['SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS'],
'my_table', ['SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS']
));

-- Output
SELECT c.status, c.age_band
  FROM IDENTIFIER('SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS') AS p
  JOIN IDENTIFIER('SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS') AS c
  ON p.age_band = c.age_band
  ;

-- Put in the policy filters and declare the template
CALL samooha_by_snowflake_local_db.provider.add_custom_sql_template(
    $cleanroom_name,
    'simple_template',
    $$
    SELECT {{ col1 | sqlsafe | column_policy }}, {{ col2 | sqlsafe | column_policy }}
      FROM IDENTIFIER({{ source_table[0] }}) AS p
      JOIN IDENTIFIER({{ my_table[0] }}) AS c
      ON {{ provider_join_col | sqlsafe | join_policy }} = {{ consumer_join_col | sqlsafe | join_policy }}
      {% if where_phrase %} WHERE {{ where_phrase | sqlsafe }}{% endif %};
    $$,
);

Copy

Data protection¶

Templates can access only datasets linked into the clean room by the provider and consumer.

Both the provider and consumer can set join, column, and activation policies on their data to protect which columns can be joined on, projected, or activated; however, the template must include the appropriate JinjaSQL filter on a column for the policy to be applied.

Custom template syntax¶

Snowflake Data Clean Rooms supports V3 JinjaSQL, with a few extensions as noted.

Template naming rules¶

When creating a template, names must be all lowercase letters, numbers, spaces, or underscores. Activation templates (except for consumer-run provider activation) must have a name beginning with “activation”. Template names are assigned when you call provider.add_custom_sql_template or consumer.create_template_request.

Example valid names:

my_template
activation_template_1

Example invalid names:

my template - Spaces not allowed
My_Template - Only lowercase templates allowed

Template variables¶

Template callers can pass in values to template variables. JinjaSQL syntax enables variable binding for any variable name within {{ double_brackets }}, but Snowflake reserves a few variable names that you should not override, as described below.

Caution

All variables, whether Snowflake-defined or custom, are populated by the user and should be treated with appropriate caution. Snowflake Data Clean Rooms templates must resolve to a single SELECT statement, but you should still remember that all variables are passed in by the caller.

Snowflake-defined variables¶

All clean room templates have access to the following global variables defined by Snowflake, but passed in by the caller:

source_table:

A zero-based string array of provider-linked tables and views in the clean room that can be used by the template. Table names are fully qualified, for example: my_db.my_sch.provider_customers

Example: SELECT col1 FROM IDENTIFIER({{ source_table[0] }}) AS p;

my_table:

A zero-based string array of consumer tables and views in the clean room that can be used by the template. Table names are fully qualified, for example: my_db.my_sch.consumer_customers

Example: SELECT col1 FROM IDENTIFIER({{ my_table[0] }}) AS c;

privacy:

A set of privacy-related values associated with users and templates. See the list of available child fields. These values can be set explicitly for the user, but your template should always provide a default value in case they are not set. Access the child fields directly in your template, such as privacy.threshold.

Example: Here is an example snippet of a template that uses threshold_value to enforce a minimum group size in an aggregation clause.

SELECT
  IFF(a.overlap > ( {{ privacy.threshold_value | default(2)  | sqlsafe }} ),
                    a.overlap,1 ) AS overlap,
  c.total_count AS total_count
  ...

Copy

Note

There are two legacy clean room global variables: measure_columns and dimensions. They are no longer recommended for use, but are still defined and appear in some legacy templates and documentation, so you should not alias tables or columns using either of these names to avoid naming collisions.

Custom variables¶

Template creators can include arbitrary variables in a template that can be populated by the caller. These variables can have any arbitrary Jinja-compliant name except for the Snowflake-defined variables or table alias names. If you want your template to be usable in the clean rooms UI, you must also provide a UI form for clean rooms UI users. For API users, you should provide good documentation for the required and optional variables.

Custom variables can be accessed by your template, as shown here for the custom variable max_income:

SELECT income FROM my_db.my_sch.customers WHERE income < {{ max_income }};

Copy

Users can pass variables to a template in two different ways:

In the clean rooms UI, by selecting or providing values through a UI form created by the template developer. This UI form contains form elements where the user can provide values for your template. The name of the form element is the name of the variable. The template simply uses the name of the form element to access the value. Create the UI form using provider.add_ui_form_customizations.
In code, a consumer calls consumer.run_analysis and passes in table names as argument arrays, and custom variables as name-value pairs into the analysis_arguments argument.

Note

If you need to access user-provided values in any custom Python code uploaded to the clean room, you must explicitly pass variable values in to the code through Python function arguments; template variables are not directly accessible within the python code using {{jinja variable binding syntax}}.

Resolving variables correctly¶

String values passed into the template resolve to a string literal in the final template. This can cause SQL parsing or logical errors if you don’t handle bound variables appropriately:

SELECT {{ my_col }} FROM P; resolves to SELECT 'my_col' from P; which simply returns the string “my_col” - probably not what you want.
SELECT age FROM {{ my_table[0] }} AS P; resolves to SELECT age FROM 'somedb.somesch.my_table' AS P;, which causes a parsing error because a table must be an identifier, not a literal string.
SELECT age FROM IDENTIFIER({{ my_table[0] }}) AS P {{ where_clause }}; passing in “WHERE age < 50” evaluates to SELECT age FROM mytable AS P 'WHERE age < 50';, which is a parsing error because of the literal string WHERE clause.

Therefore, where appropriate, you must resolve variables. Here is how to resolve variables properly in your template:

Table and column names

Variables that specify table or column names must be converted to identifiers in your template in one of two ways:

IDENTIFIER: For example: SELECT IDENTIFIER({{ my_column }}) FROM P;
sqlsafe: This JinjaSQL filter resolves identifier strings to SQL text. An equivalent statement to the previous bullet is SELECT {{ my_column | sqlsafe }} FROM P;

Your particular usage dictates when to use IDENTIFIER or sqlsafe. For example, c.{{ my_column | sqlsafe }} can’t easily be rewritten using IDENTIFIER.

Dynamic SQL

When you have a string variable that should be used as literal SQL, such as a WHERE clause, use the sqlsafe filter in your template. For example:

SELECT age FROM IDENTIFIER({{ my_table[0] }}) AS C WHERE {{ where_clause }};

Copy

If a user passes in “age < 50” to where_clause, the query would resolve to SELECT age FROM sometable AS C WHERE 'age < 50'; which is invalid SQL because of the literal string WHERE condition. In this case you should use the sqlsafe filter:

SELECT age FROM IDENTIFIER( {{ my_table[0] }} ) as C {{ where_clause | sqlsafe }};

Copy

Required table aliases¶

At the top level of your query, all tables or subqueries must be aliased as either P (for provider-tables) or C (for consumer tables) in order for Snowflake to validate join and column policies correctly in the query. Any column that must be verified against join or column policies belong to a table that is aliased as either P or C. (Specifying P or C tells the back end whether to validate a column against the provider or the consumer policy respectively.)

If you use multiple provider or consumer tables in your query, add a numeric, sequential 1-based suffix to each table alias after the first. So: P, P1, P2, and so on for the first, second, and third provider tables, and C, C1, C2, and so on for the first, second, and third consumer tables. The P or C index should be sequential without gaps (that is, create the aliases P, P1, and P2, not P, P2, and P4).

Example

SELECT col1 FROM IDENTIFIER({{ source_table[0] }}) AS P;

Copy

Template filters¶

Snowflake supports all the standard Jinja filters and most of the standard JinjaSQL filters, along with a few extensions:

join_policy: Verifies whether the column is allowed by the table’s join policy, and fails if it is not.
column_policy: Verifies whether the column is allowed by the template’s column policy (is allowed to be projected).
activation_policy: Verifies whether the filtered column is allowed by the clean room’s activation policies (provider.set_activation_policy or consumer.set_activation_policy).
join_and_column_policy: Verifies whether the column is permitted by the join, activation, or column policies. Used to provide more flexibility in the clean room, to allow the collaborators to update join and column policies without changing the template.
The identifier JinjaSQL filter is not supported by Snowflake templates.

JinjaSQL statements are evaluated left to right:

{{ my_col | column_policy }} Correct
{{ my_col | sqlsafe | column_policy }} Correct
{{ column_policy | my_col }} Incorrect
{{ my_col | column_policy | sqlsafe }} Incorrect: column_policy will be checked against the my_col value as string, which is an error.

Enforcing clean room policies¶

Clean rooms do not automatically check clean room policies against columns used in a template. If you want to enforce a policy against a column, you must apply the appropriate policy filter to that column in the template. For example:

JOIN IDENTIFIER({{ source_table[0] }}) AS p
  ON {{ c_join_col | sqlsafe | join_policy }} = {{ p_join_col | sqlsafe }}

Copy

This will test the join policy against the column passed in to c_join_col, but not against p_join_col.

Note that column names cannot be ambiguous when testing policies, the same as any other SQL usage. So if you have columns with the same name in two tables, you must qualify the column name in order to test the policy against that column.

Running custom Python code¶

Templates can run Python code uploaded to the clean room. The template can call a Python function that accepts values from a row of data and returns values to use or project in the query.

When a provider uploads custom Python code into a clean room, the template calls Python functions with the syntax cleanroom.function_name. More details here.
When a consumer uploads custom Python code into a clean room, the template calls the function with the bare function_name value passed to consumer.generate_python_request_template (not scoped to cleanroom as provider code is). More details here.

Provider code example:

-- Provider uploads a Python function that takes two numbers and returns the sum.
call samooha_by_snowflake_local_db.provider.load_python_into_cleanroom(
  $cleanroom_name,
  'simple_addition',                        -- Function name to use in the template
  ['someval integer', 'added_val integer'], -- Arguments
  [],                                       -- No packages needed
  'integer',                                -- Return type
  'main',                                   -- Handler for function name
  $$

def main(input, added_val):
  return input + int(added_val)
    $$
);

-- Template passes value from each row to the function, along with a
-- caller-supplied argument named 'increment'
call samooha_by_snowflake_local_db.provider.add_custom_sql_template(
    $cleanroom_name,
    'simple_python_example',
$$
    SELECT val, cleanroom.simple_addition(val, {{ increment | sqlsafe }})
    FROM VALUES (5),(8),(12),(39) AS P(val);
$$
);

Copy

Security considerations¶

A template must evaluate to a single SELECT query, which is executed by the clean room native application. The template is not executed with the identity of the current user.

The user does not have direct access to any data within the clean room; all access is through the native application via the template results.

Apply a policy filter any time a column is used in your query, even when you define a column name explicitly in the template, or when the column or table is provided by you. You might change your join or column policies later, or change the column, and forget to update the template. For any columns provided by the user, you should apply a join_policy, column_policy, join_and_column_policy, or activation_policy filter.

Activation templates¶

A template can also be used to save query results to a table outside of the clean room; this is called activation. Currently the only forms of activation supported for custom templates are provider activation and consumer activation (storing results to the provider or consumer’s Snowflake account, respectively). Learn how to implement activation.

An activation template is an analysis template with the following additional requirements:

Activation templates are JinjaSQL statements that evaluate to a SQL script block, unlike analysis templates, which can be simple SELECT statements.
The name of the activation template must begin with the string activation (except for consumer-run provider activation templates). For example: activation_my_template.
The activation template must create a table with a name that depends on the kind of activation it enables:
- Provider-run provider activation: The generated table name must be cleanroom.temp_result_data.
- All other activation types: The generated table name must be prefixed by cleanroom.activation_data_, for example: cleanroom.activation_data_cross_activation_results. The table name should be unique within your clean room.
This generated table is an intermediary table; you shouldn’t try to access it directly.
The script block should end with a RETURN statement that returns the name of the generated table, minus any cleanroom. or cleanroom.activation_data_ prefix.
Any columns being activated must be listed in the activation policy of the provider or consumer who linked the data, and should have the activation_policy filter applied to it. Note that a column can be both an activation and a join column.
If the template is to be run from the clean rooms UI, you should provide a web form that includes the activation_template_name and enabled_activations fields. Templates for use in the UI must have both an analysis template and an associated activation template.
All calculated columns must be explicitly aliased, rather than having inferred names, because a table is being generated. That is:

SELECT COUNT(*), P.status from T AS P; FAILS, because the COUNT column name is inferred.

SELECT COUNT(*) AS COUNT_OF_ITEMS, P.status from T AS P; SUCCEEDS, because it explicitly aliases the COUNT column.

Here are two sample basic activation templates. One is for provider-run server activation, the other is for other activation types. They differ in the two highlighted lines, which contain the results table name.

-- These are the required table name strings.
BEGIN
  CREATE OR REPLACE TABLE cleanroom.temp_result_data AS
    SELECT COUNT(c.status) AS ITEM_COUNT, c.status, c.age_band
      FROM IDENTIFIER({{ my_table[0] }}) AS c
    JOIN IDENTIFIER({{ source_table[0] }}) AS p
      ON {{ c_join_col | sqlsafe | activation_policy }} = {{ p_join_col | sqlsafe | activation_policy }}
    GROUP BY c.status, c.age_band
    ORDER BY c.age_band;
  RETURN 'temp_result_data';
END;

Copy

-- analysis_results can be whatever name you want.
BEGIN
  CREATE OR REPLACE TABLE cleanroom.activation_data_analysis_results AS
    SELECT COUNT(c.status) AS ITEM_COUNT, c.status, c.age_band
      FROM IDENTIFIER({{ my_table[0] }}) AS c
    JOIN IDENTIFIER({{ source_table[0] }}) AS p
      ON {{ c_join_col | sqlsafe | activation_policy }} = {{ p_join_col | sqlsafe | activation_policy }}
    GROUP BY c.status, c.age_band
    ORDER BY c.age_band;
  RETURN 'analysis_results';
END;

Copy

Next steps¶

After you’ve mastered the templating system, read the specifics for implementing a clean room with your template type:

Provider templates are templates written by the provider. This is the default use case.
Consumer templates are templates written by the consumer. In some cases, a clean room creator wants to enable the consumer to create, upload, and run their own templates to the clean room.
Activation templates create a results table after a successful run. Depending on the activation template, the results table can either be saved to the provider or consumer’s account outside the clean room, or sent to a third-party activation provider listed in the Activation Hub.
Chained templates allow you to chain together multiple templates where the output of each template is used by the next template in the chain.

Custom clean room template reference¶

About clean room templates¶

Creating and running a custom template¶

A quick example¶

Developing a custom template¶

Data protection¶

Custom template syntax¶

Template naming rules¶

Template variables¶

Snowflake-defined variables¶

Custom variables¶

Resolving variables correctly¶

Required table aliases¶

Template filters¶

Enforcing clean room policies¶

Running custom Python code¶

Security considerations¶

Activation templates¶

Next steps¶

More information¶