Creating custom clean room templates¶

About clean room templates¶

Clean room templates are written in JinjaSQL. JinjaSQL is an extension to the Jinja templating language that generates a SQL query as output. JinjaSQL supports logic statements and run-time variable resolution to let the user customize the query at run time. Variables are typically used in a template to allow a user to specify table names, table columns, and custom values to use in their query.

Snowflake provides a selection of pre-designed templates for common use cases. These stock templates can be used only in the web application. However, both providers and consumers can create custom templates for a clean room. Custom templates can be created only in code, but can be run either in code or through the web app.

Creating and running a custom template¶

In a clean room with default settings, the provider adds a template to a clean room and the consumer can choose, configure, and run it:

  1. The provider designs a custom template and adds it to a clean room by calling provider.add_custom_sql_template.

  2. The consumer calls consumer.run_analysis to run the provider’s template, passing in values for any variables needed by the template.

This flow does not require permissions from the other party, other than that the consumer must be invited to a clean room by the provider. There are variations to this process, such as consumer-provided templates and provider-run templates, which are covered elsewhere.

Tip

When developing a custom template, you will probably modify the template and re-run the same query. To avoid receiving cached responses against your old template, set the consumer.run_analysis argument use_cache argument to FALSE.

Data protection¶

Templates can access only datasets linked into the clean room by the provider and consumer.

Both the provider and consumer have the ability to set join, column, and activation policies on their data to protect which columns can be joined on, projected, or projected in activated results.

A quick example¶

Here is a simple SQL example that joins a provider and a consumer table by email and shows the overlap count per city:

SELECT COUNT(*), city FROM consumer_table
  INNER JOIN provider_table
  ON consumer_table.hashed_email = provider_table.hashed_email
  GROUP BY city;
Copy

Here is how that query would look as a template that allows the caller to choose the select/group and join columns and the tables:

SELECT COUNT(*), IDENTIFIER({{ group_by_col | column_policy }})
  FROM IDENTIFIER({{ my_table[0] }}) AS C
  INNER JOIN IDENTIFIER({{ source_table[0] }}) AS P
  ON IDENTIFIER({{ consumer_join_col | join_policy }}) = IDENTIFIER({{ provider_join_col | join_policy }})
  GROUP BY IDENTIFIER({{ group_by_col | column_policy }});
Copy

Notes on the template:

  • Values within {{ double bracket pairs }} are custom variables. group_by_col, my_table, source_table, consumer_join_col, provider_join_col, and group_by_col are all custom variables populated by the caller.

  • source_table and my_table are Snowflake-defined string array variables populated by the caller. Array members are fully-qualified names of provider and consumer tables linked into the clean room. The caller specifies which tables should be included in each array.

  • Provider tables must be aliased as P and consumer tables as C in a template. If you have multiple tables, you can index them as P1, P2, C1, C2, and so on.

  • IDENTIFIER is needed for all column and table names, because variables in {{ double brackets }} evaluate to string literals, which aren’t valid identifiers.

  • JinjaSQL filters can be applied to variables. Snowflake implements the custom filters join_policy and column_policy, which verify whether a column complies with join or column policies in the clean room respectively, and fail the query if it does not. A filter is applied to a column name as {{ column_name | filter_name }}.

All these points will be discussed in detail later.

Here is how a consumer might run this template in code. Note how column names are qualified by the table aliases declared in the template.

CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.CONSUMER.RUN_ANALYSIS(
  $cleanroom_name,
  $template_name,
  ['my_db.my_sch.consumer_table],       -- Populates the my_table variable
  ['my_db.my_sch.provider_table'],      -- Populates the source_table variable
  OBJECT_CONSTRUCT(                     -- Populates custom named variables
    'consumer_join_col','c.age_band',
    'provider_join_col','p.age_band',
    'group_by_col','p.device_type'
  )
);
Copy

In order to use this template in the web application, the provider must create a custom UI form for the template. The UI form has named form elements that correspond to template variable names, and the values provided in the form are passed into the template.

Custom template syntax¶

Snowflake Data Clean Rooms supports V3 JinjaSQL, with a few extensions as noted.

This section includes the following topics:

Template variables¶

Template callers can pass in values to template variables. JinjaSQL syntax enables variable binding for any variable name within {{ double_brackets }}, but Snowflake reserves a few variable names that you should not override, as described below.

Caution

All variables, whether Snowflake-defined or custom, are populated by the user and should be treated with appropriate caution. Snowflake Data Clean Rooms templates must resolve to a single SELECT statement, but you should still remember that all variables are passed in by the caller.

Snowflake-defined variables¶

All clean room templates have access to the following global variables defined by Snowflake, but passed in by the caller:

  • source_table: A zero-based string array of provider-linked tables and views in the clean room that can be used by the template. Table names are fully qualified, for example: my_db.my_sch.provider_customers

    Example: SELECT col1 FROM IDENTIFIER({{ source_table[0] }}) AS P;

  • my_table: A zero-based string array of consumer tables and views in the clean room that can be used by the template. Table names are fully qualified, for example: my_db.my_sch.consumer_customers

    Example: SELECT col1 FROM IDENTIFIER({{ my_table[0] }}) AS C;

Note

There are two legacy clean room global variables: measure_columns and dimensions. They are no longer recommended for use, but are still defined and appear in some legacy templates and documentation, so you should not alias tables or columns using either of these names to avoid naming collisions.

Custom variables¶

Template creators can include arbitrary variables in a template that can be populated by the caller. These variables can have any arbitrary Jinja-compliant name except for the Snowflake-defined variables or table alias names. If you want your template to be usable in the web application, you must also provide a UI form for web app users. For code users, you should provide good documentation for the required and optional variables.

Custom variables can be accessed by your template, as shown here for the custom variable max_income:

SELECT income FROM my_db.my_sch.customers WHERE income < {{ max_income }};
Copy

Users can pass variables to a template in two different ways:

  • In the web app, by selecting or providing values through a UI form created by the template developer. This UI form contains form elements where the user can provide values for your template. The name of the form element is the name of the variable. The template simply uses the name of the form element to access the value. Create the UI form using provider.add_ui_form_customizations.

  • In code, a consumer calls consumer.run_analysis and passes in table names as argument arrays, and custom variables as name-value pairs into the analysis_arguments argument.

Note

If you need to access user-provided values in any custom Python code uploaded to the clean room, you must explicitly pass variable values in to the code through Python function arguments; template variables are not directly accessible within the python code using {{jinja variable binding syntax}}.

Resolving variables correctly¶

String values passed into the template resolve to a string literal in the final template. This can cause SQL parsing or logical errors if you don’t handle bound variables appropriately:

  • SELECT {{ my_col }} FROM P; resolves to SELECT 'my_col' from P; which simply returns the string “my_col” - probably not what you want.

  • SELECT age FROM {{ my_table[0] }} AS P; resolves to SELECT age FROM 'somedb.somesch.my_table' AS P;, which causes a parsing error because a table must be an identifier, not a literal string.

  • SELECT age FROM IDENTIFIER({{ my_table[0] }}) AS P {{ where_clause }}; passing in “WHERE age < 50” evaluates to SELECT age FROM mytable AS P 'WHERE age < 50';, which is a parsing error because of the literal string WHERE clause.

Therefore, where appropriate, you must resolve variables. Here is how to resolve variables properly in your template:

Table and column names

Variables that specify table or column names must be converted to identifiers in your template in one of two ways:

  • IDENTIFIER: For example: SELECT IDENTIFIER({{ my_column }}) FROM P;

  • sqlsafe: This JinjaSQL filter resolves identifier strings to SQL text. An equivalent statement to the previous bullet is SELECT {{ my_column | sqlsafe }} FROM P;

Your particular usage dictates when to use IDENTIFIER or sqlsafe. For example, c.{{ my_column | sqlsafe }} can’t easily be rewritten using IDENTIFIER.

Dynamic SQL

When you have a string variable that should be used as literal SQL, such as a WHERE clause, use the sqlsafe filter in your template. For example:

SELECT age FROM IDENTIFIER({{ my_table[0] }}) AS C WHERE {{ where_clause }};
Copy

If a user passes in “age < 50” to where_clause, the query would resolve to SELECT age FROM sometable AS C WHERE 'age < 50'; which is invalid SQL because of the literal string WHERE condition. In this case you should use the sqlsafe filter:

SELECT age FROM IDENTIFIER( {{ my_table[0] }} ) as C {{ where_clause | sqlsafe }};
Copy

Required table aliases¶

At the top level of your query, all tables or subqueries must be aliased as either P (for provider-tables) or C (for consumer tables) in order for Snowflake to validate join and column policies correctly in the query. Any column that must be verified against join or column policies belong to a table that is aliased as either P or C. (Specifying P or C tells the back end whether to validate a column against the provider or the consumer policy respectively.)

If you use multiple provider or consumer tables in your query, add a numeric, sequential 1-based suffix to each table alias after the first. So: P, P1, P2, and so on for the first, second, and third provider tables, and C, C1, C2, and so on for the first, second, and third consumer tables. The P or C index should be sequential without gaps (that is, create the aliases P, P1, and P2, not P, P2, and P4).

Example

SELECT col1 FROM IDENTIFIER({{ source_table[0] }}) AS P;
Copy

Template filters¶

Snowflake supports most of the standard JinjaSQL filters, along with a few custom filters:

  • join_policy: Verifies whether the column is allowed by the table’s join policy, and fails if it is not.

  • column_policy: Verifies whether the column is allowed by the template’s column policy (is allowed to be projected).

  • activation_policy: Verifies whether the filtered column is allowed by the clean room’s activation policies (provider.set_activation_policy or consumer.set_activation_policy).

  • join_and_column_policy: Verifies whether the column is permitted by the join, activation, or column policies. Used to provide more flexibility in the clean room, to allow the collaborators to update join and column policies without changing the template.

  • sql_safe: Standard JinjaSQL filter. Use to convert string literals into SQL statements or identifiers as needed.

  • bind: Standard JinjaSQL filter.

  • inclause: Standard JinjaSQL filter.

  • The `identifier` JinjaSQL filter is not supported by Snowflake templates.

JinjaSQL filters modify the content that precedes it:

  • {{ my_col | column_policy }} Correct

  • {{ column_policy | my_col }} Incorrect

Running custom Python code¶

Templates can run Python code uploaded to the clean room. The template can call a Python function that accepts values from a row of data and returns values to use or project in the query.

  • When a provider uploads custom Python code into a clean room, the template calls Python functions with the syntax cleanroom.function_name. More details here.

  • When a consumer uploads custom Python code into a clean room, the template calls the function with the bare function_name value passed to consumer.generate_python_request_template (not scoped to cleanroom as provider code is). More details here.

Provider code example:

-- Provider uploads a Python function that takes two numbers and returns the sum.
call samooha_by_snowflake_local_db.provider.load_python_into_cleanroom(
  $cleanroom_name,
  'simple_addition',                        -- Function name to use in the template
  ['someval integer', 'added_val integer'], -- Arguments
  [],                                       -- No packages needed
  'integer',                                -- Return type
  'main',                                   -- Handler for function name
  $$

def main(input, added_val):
  return input + int(added_val)
    $$
);

-- Template passes value from each row to the function, along with a
-- caller-supplied argument named 'increment'
call samooha_by_snowflake_local_db.provider.add_custom_sql_template(
    $cleanroom_name,
    'simple_python_example',
$$
    SELECT val, cleanroom.simple_addition(val, {{ increment | sqlsafe }})
    FROM VALUES (5),(8),(12),(39) AS P(val);
$$
);
Copy

Security considerations¶

A template must evaluate to a single SELECT query, which is executed by the clean room native application. The template is not executed with the identity of the current user.

The user does not have direct access to any data within the clean room; all access is through the native application via the template results.

Apply a policy filter any time a column is used in your query, even when you define a column name explicitly in the template, or when the column or table is provided by you. You might change your join or column policies later, or change the column, and forget to update the template. For any columns provided by the user, you should apply a join_policy, column_policy, join_and_column_policy, or activation_policy filter.

Next steps¶

After you’ve mastered the templating system, read the specifics for implementing a clean room with your template type:

  • Provider templates are templates written by the provider. This is the default use case.

  • Consumer templates are templates written by the consumer. In some cases, a clean room creator wants to enable the consumer to create, upload, and run their own templates to the clean room.

  • Activation templates create a results table after a successful run. Depending on the activation template, the results table can either be saved to the provider or consumer’s account outside the clean room, or sent to a third-party activation provider listed in the Activation Hub.

  • Chained templates allow you to chain together multiple templates where the output of each template is used by the next template in the chain.