Tutorial: Get started with Snowflake Data Clean Rooms in code

Introduction

This tutorial is aimed at developers who will create or use Snowflake Data Clean Rooms in code. This tutorial uses SQL code, but you can adapt the information shown here to create and use clean rooms in any coding language supported by Snowflake.

What you will learn

This tutorial will show you how to create and share a basic template in a clean room using the Snowflake Data Clean Rooms API. It will also show you how to run an analysis using the API in a clean room shared with you.

This tutorial will create a clean room with one table provided by the provider, one table provided by the consumer, and a template defined by the provider that defines a very simple JOIN query on the two tables. The consumer runs the template.

Requirements

  • You should have a basic understanding of Snowflake and you should also read About Snowflake Data Clean Rooms before starting this tutorial.

  • You must have access to two Snowflake accounts. One will be used as a provider account (the account that creates the clean room), and the other will be used as a consumer account (the one who is shared with, and who runs the query).

  • Both accounts must have the Snowflake Data Clean Rooms environment installed. If you don’t have the environment installed in each account, you can either install it yourself, or else ask a Snowflake administrator to install it for you.

  • Both accounts must be capacity accounts, not on-demand accounts.

  • Both accounts should be in the same organization in Snowflake.

  • The provider account must be Enterprise Edition or higher. The consumer account can be Standard Edition or higher.

  • The consumer account must be added to the Collaborators page in the web app in the provider’s account.

  • Both accounts must be granted the SAMOOHA_APP_ROLE.

  • For this tutorial, both accounts must be in the same cloud region. You can determine your cloud region by running SELECT CURRENT_REGION();

  • For this tutorial, both accounts must be in the same organization.

Configure your environment

Ask your clean rooms administrator to add you as a user to two different accounts. Choose one to be the provider and one to be the consumer.

  1. Sign in to Snowsight in two separate browser tabs, one for each account. Decide which account you will use as the provider and which one as the consumer.

  2. Open a new notebook in each account. Name the provider notebook “Test provider” and the consumer notebook “Test consumer”.

The rest of this tutorial will indicate whether to take the desired actions in the provider or consumer account.

Provider: Overview

Here is a summary of the steps you’ll take to create a clean room:

  1. Create test data to share in your clean room.

  2. Create your clean room.

  3. Bring the data you created into the clean room.

  4. Set join permissions on your data to specify which columns can be joined on when queried by consumers.

  5. Create a template for your clean room. A clean room template is written in JinjaSQL and it evaluates to a SQL query at run time. Most templates include variables that allow collaborators to specify table and column names, WHERE clause conditions, and more, at run time. A clean room collaborator chooses and runs a template in a clean room.

  6. Specify the default version of the clean room.

  7. Add consumers who can access your clean room. In this tutorial consumers must be Snowflake users with accounts approved by your clean room administrator.

  8. Publish the clean room to make it available to your invited consumers.

Note

The term collaborator is used above for templates because, depending on how the clean room is configured, both providers and consumers can create or run templates. This tutorial shows only how to enable consumer-run templates.

Provider: Create test data

Start work in your provider account notebook.

First you will create a table based on 1,000 rows of sample test data from the SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS table. You’ll use that table as the provider data in the clean room.

Use a role that allows you to create a database. The example here uses ACCOUNTADMIN, but you can use any role that enables creating a table.

USE WAREHOUSE app_wh;
USE ROLE ACCOUNTADMIN;
-- Using ACCOUNTADMIN role because you need a role that allows you to create a database.
-- Feel free to use any other role that can create a database.

-- Generate a provider dataset based on the first 1,000 rows of sample data.
CREATE DATABASE IF NOT EXISTS cleanroom_tut_db;
CREATE SCHEMA IF NOT EXISTS cleanroom_tut_db.cleanroom_tut_sch;

CREATE TABLE IF NOT EXISTS cleanroom_tut_db.cleanroom_tut_sch.demo_provider_table AS
  SELECT TOP 1000 * FROM SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS ORDER BY HASHED_EMAIL ASC;

DESCRIBE TABLE cleanroom_tut_db.cleanroom_tut_sch.demo_provider_table;
Copy

Provider: Create the clean room

A clean room starts with no data, no users, and only default properties.

The following snippet creates a clean room that is accessible only within the organization (so it’s marked as INTERNAL). To share a clean room outside of an organization requires additional steps that won’t be covered in this tutorial.

You must use the SAMOOHA_APP_ROLE for nearly all clean room actions.

USE ROLE samooha_app_role;
SET cleanroom_name = 'Developer Tutorial';
CALL samooha_by_snowflake_local_db.provider.cleanroom_init($cleanroom_name, 'INTERNAL');
Copy

Provider: Bring data into the clean room

Let’s bring your test data into the clean room.

There are two steps in bringing data into the clean room:

  1. Registering the data

  2. Importing the data into the clean room

Register the data

The first step in importing data is to register the database, schema, or object. This grants the SELECT privilege to the clean room native application so it can read your data. This step must be done with a role that has the ability to grant SELECT permission to another role. You can register an entire database, or register specific schemas, tables, or views within the database, depending on what level of control you want.

This example uses the ACCOUNTADMIN role, but use whatever role can grant that ability in your own environment.

Note that you could also register data using the web application.

In the real world, the clean room administrator typically pre-registers data for all clean room creators, and you could skip this step.

USE ROLE ACCOUNTADMIN;
CALL samooha_by_snowflake_local_db.provider.register_db('cleanroom_tut_db');
Copy

Import the data into the clean room

Importing data into a clean room is called linking. Both providers and consumers can link their data into a clean room (and set rules on how it can be used, which will be covered later). The generic term for a view or table linked into a clean room is a dataset.

When you link data, the clean room creates a read-only view linked to your source data. This clean room view is a secure, encrypted view inside the clean room, accessible only to templates within the clean room. The view strips out Snowflake policies such as aggregation and join policies from the source data; the clean room supports its own protection and privacy settings. Your template accesses this secure view, not the source data, although the source name is used in the SQL query.

Unlike registering, linking is at the individual table or view level, and you can link multiple items in one call.

Link the table that you created earlier into the clean room:

-- Back to samooha_app_role until you need to clean things up at the end.
USE ROLE samooha_app_role;
CALL samooha_by_snowflake_local_db.provider.link_datasets($cleanroom_name,
  ['cleanroom_tut_db.cleanroom_tut_sch.demo_provider_table']);

CALL samooha_by_snowflake_local_db.provider.view_provider_datasets($cleanroom_name);
Copy

Provider: Set join policies on the data

Both providers and consumers can specify join policies on their own data. A clean room join policy specifies which columns in a table can be joined on by queries in that clean room. This provides an extra level of control over how others can use your data in the clean room.

Note that clean room join policies are not the same as Snowflake join policies. Snowflake join policies, and all other table policies, are stripped out of the table when it is linked into the clean room.

Provide a list of joinable columns to the clean room. Any columns not listed here cannot be joined on using INNER JOIN or OUTER JOIN statements in the clean room. Specify joinable columns for a table using the format database_name.schema_name.table_or_view_name:column_name for each column.

-- Limit joinable columns in this table to age_band, region_code, and device_type
CALL samooha_by_snowflake_local_db.provider.set_join_policy($cleanroom_name,
  ['cleanroom_tut_db.cleanroom_tut_sch.demo_provider_table:age_band',
   'cleanroom_tut_db.cleanroom_tut_sch.demo_provider_table:region_code',
   'cleanroom_tut_db.cleanroom_tut_sch.demo_provider_table:device_type']);

CALL samooha_by_snowflake_local_db.provider.view_join_policy($cleanroom_name);
Copy

Provider: Add your template

A clean room template is a JinjaSQL template that evaluates to a SELECT query. This query has access to all datasets linked into the clean room, subject to join and column policies.

This tutorial won’t cover the details of designing a JinjaSQL template, but here is the SQL query that you’re trying to implement:

SELECT COUNT(*), group_by_col FROM Consumer_Table AS C
  INNER JOIN Provider_Table AS P
  ON C.join_col = P.join_col
  GROUP BY group_col;
Copy

The query simply joins one provider and one consumer table on a specified join column, groups by a specified grouping column, and projects the group value and count of each group. This is the query that will run in the clean room when the user runs the template.

Here is the JinjaSQL template for the same query, with variables added where the consumer can specify tables or columns. After the consumer specifies the variables, it will evaluate to a SQL query similar to the one above, but with the table and column names provided by the consumer.

SELECT COUNT(*), IDENTIFIER({{group_by_col | column_policy}}) FROM IDENTIFIER({{my_table[0]}}) AS C
  INNER JOIN IDENTIFIER({{source_table[0]}}) AS P
  ON IDENTIFIER({{consumer_join_col | join_policy}}) = IDENTIFIER({{provider_join_col | join_policy}})
  GROUP BY IDENTIFIER({{group_by_col | column_policy}});
Copy

A few notes on the template:

  • Content surrounded by {{brackets}} are named variables passed in by the consumer when they run the template. The following variables are passed in by the consumer: group_by_col, consumer_join_col, provider_join_col

  • The my_table and source_table arrays are global variables created by the system, populated with consumer and provider table names passed in by the caller. These tables must be linked into the clean room by the consumer and provider.

  • All provider tables must be aliased as p in the query. All consumer tables must be aliased as c. If you use multiple tables, alias them with a 1-based suffix, so: p, p1, p2, p3 and so on for provider tables, and c, c1, c2, c3 and so on for consumer table aliases. (p and p0 are equivalent.)

  • Snowflake Data Clean Rooms supports some custom JinjaSQL filters that act on variables. The column_policy and row_policy filters verify that the columns they are applied to conform to the column and row policies in that clean room, or else the request to run the template will fail. So {{ consumer_join_col | join_policy }} verifies that the value passed in to consumer_join_col conforms to the join policies set by the provider and consumer in this clean room.

  • Variables used as identifiers must be processed by the IDENTIFIER function before they can be used in SQL.

Add the template to the clean room:

-- Add the template
SET template_name = 'overlap_template';
CALL samooha_by_snowflake_local_db.provider.add_custom_sql_template(
    $cleanroom_name,
    $template_name,
    $$
    SELECT COUNT(*), IDENTIFIER({{group_by_col | column_policy}}) FROM IDENTIFIER({{my_table[0]}}) AS C
      INNER JOIN IDENTIFIER({{source_table[0]}}) AS P
      ON IDENTIFIER({{consumer_join_col | join_policy}}) = IDENTIFIER({{provider_join_col | join_policy}})
      GROUP BY IDENTIFIER({{group_by_col | column_policy}});
    $$);

CALL samooha_by_snowflake_local_db.provider.view_added_templates($cleanroom_name);
Copy

Provider: Set column policies

Each party in the clean room can limit which columns the other parties can display in results by setting a column_policy. A column policy in a clean room lists all the columns that can be projected; no other columns can be projected. The provider sets the column policies for their tables; the consumer sets column policies for their tables.

A column policy is tied to a specific table and template in a clean room. You can allow different columns to be displayed in different templates. The same column cannot be in both a join and a column policy.

Note that column and join policies are enforced only if the template uses the column_policy and row_policy filters in the template. Also, if you do not specify a column policy, that means that all your columns can be projected; similarly, not specifying a join policy means that all your columns can be joined.

Here is how to allow projection of three columns of your data in the template we just created. Column syntax is template_name:table_name:column_name

-- Set column policies. Column policies are tied to a specific template and table, so we
-- needed to add the template first.
CALL samooha_by_snowflake_local_db.provider.set_column_policy($cleanroom_name,
  [$template_name || ':cleanroom_tut_db.cleanroom_tut_sch.demo_provider_table:STATUS',
   $template_name || ':cleanroom_tut_db.cleanroom_tut_sch.demo_provider_table:AGE_BAND',
   $template_name || ':cleanroom_tut_db.cleanroom_tut_sch.demo_provider_table:DAYS_ACTIVE']);

CALL samooha_by_snowflake_local_db.provider.view_column_policy($cleanroom_name);
Copy

Provider: Add a release directive

Every clean room has a version number, consisting of major, minor, and patch values. You need to specify which version of the clean room is served to a consumer: this is called the default release directive.

This is the first version, so the version number is 1.0.0.

CALL samooha_by_snowflake_local_db.provider.set_default_release_directive($cleanroom_name, 'V1_0', '0');
Copy

Snowflake creates a new version of the clean room each time you upload code into the clean room, and if you want users to get the latest version, you’ll need to call this procedure again with the new version if you want users to get the latest version. You won’t be uploading code, so you won’t need to call this again.

Provider: Add consumers

Now specify which accounts can use your clean room as consumers. A few requirements when sharing a clean room:

  • Users must be in a Snowflake account with the clean room environment installed. It is possible to share a clean room with a non-Snowflake account, but that isn’t covered here.

  • The consumer account must be in the same cloud region as the provider account. You can share across regions, but that requires extra configuration that isn’t covered here.

  • The consumer account must be added to the Collaborators page in the provider’s account. Only accounts added to the collaborators list for an account can be extended an invitation to join a clean room created in that account.

  • The account locator can be obtained by running the following procedure in your consumer account: SELECT CURRENT_ACCOUNT();

  • The account name is in the format org_name.account_name. You can get these values by running the following procedures in your consumer account:

    • SELECT CURRENT_ORGANIZATION_NAME();

    • SELECT CURRENT_ACCOUNT_NAME();

CALL samooha_by_snowflake_local_db.provider.add_consumers(
  $cleanroom_name,
  <CONSUMER_LOCATOR>,
  <ORG_NAME>.<ACCOUNT_NAME>);

CALL samooha_by_snowflake_local_db.provider.view_consumers($cleanroom_name);
Copy

Provider: Publish the clean room

Finally, you can publish the clean room. This makes the clean room available to the consumers you added above. The procedure takes a minute or more to complete.

-- Publish the clean room.
CALL samooha_by_snowflake_local_db.provider.create_or_update_cleanroom_listing($cleanroom_name);
Copy

When the procedure finishes, you should see the clean room listed in the clean rooms web app of your provider account in the Created tab, and in the Invited tab of the web app of your consumer account, with the label “Powered by Dev Edition.” The consumer account will receive an invitation email and the clean room should appear in the Invited tab in the web app of the consumer account.

Congratulations: You’ve published your first clean room!

Now move over to the consumer account to use the clean room.

Consumer: Install (join) the clean room

Switch over to the consumer account in Snowsight. You can open a second tab in your browser, or use the account switcher.

After you’re signed in with your consumer account, set up your environment in your consumer notebook:

USE WAREHOUSE app_wh;
USE ROLE samooha_app_role;
Copy

Install the clean room you just published and shared.

Note

If you open the web application for this account, you will see this clean room in the Invited tab with the label Powered by Dev Edition to indicate that it was created in code. You could install the clean room from there, but we’ll show you how to do it in code here.

To install a clean room, you must specify both the clean room name and the account locator of the provider who shared the clean room with you. Specifying the clean room name and the account locator helps disambiguate multiple clean room invitations. You can run SELECT CURRENT_ACCOUNT(); in the provider account to get the provider locator.

Installation can take a few minutes.

SET cleanroom_name = 'Developer Tutorial';
CALL samooha_by_snowflake_local_db.consumer.install_cleanroom($cleanroom_name, <PROVIDER_LOCATOR>);
Copy

Consumer: Set join policies on your data

You can now set join policies on your data, just as you did in the provider account. Setting consumer join policies is redundant in this example because only consumers can run a template in this clean room. And since you’re running the template yourself, you know which of your columns should be joinable. However, in real usage it makes sense to set join policies on your data in case the clean room supports allowing the provider to run a template.

-- Allow same three columns in your data to be joined.
CALL samooha_by_snowflake_local_db.consumer.set_join_policy($cleanroom_name,
  ['cleanroom_tut_db.cleanroom_tut_sch.demo_consumer_table:age_band',
  'cleanroom_tut_db.cleanroom_tut_sch.demo_consumer_table:region_code',
  'cleanroom_tut_db.cleanroom_tut_sch.demo_consumer_table:device_type']);
Copy

Consumer: Prepare your query

To run a query, you need the following information:

  • The name of the template you want to run.

  • The names of your tables to make available to the template.

  • The names of the provider’s tables to make available to the template.

  • Any other name/value variables to pass in. In our template, we need to pass in join column names from the provider and consumer tables.

Examine the template

You can examine the template to see the exact syntax and see what you need to pass in.

-- List templates in the clean room, then examine the template details
CALL samooha_by_snowflake_local_db.consumer.view_added_templates($cleanroom_name);

SET template_name = 'overlap_template';
CALL samooha_by_snowflake_local_db.consumer.view_template_definition($cleanroom_name, $template_name);
Copy

You can see that you need to pass in a provider table and column name, a consumer table and column name, and a grouping column.

List the available provider tables

See which tables the provider has added to the clean room.

-- Table name to use is in the LINKED_TABLE column in the results.
CALL samooha_by_snowflake_local_db.consumer.view_provider_datasets($cleanroom_name);
Copy

Examine the joinable and projectable columns

See which columns can be joined on or projected from the provider’s data.

-- See which provider columns can be joined on
CALL samooha_by_snowflake_local_db.consumer.view_provider_join_policy($cleanroom_name);

-- See which provider columns can be projected
CALL samooha_by_snowflake_local_db.consumer.view_provider_column_policy($cleanroom_name);
Copy

Run the analysis

Now that we know what the query needs, what provider data is available, and what can be done with it, you can choose values to pass in.

Where column names are ambiguous in the template, you must fully qualify columns with the table name. You must use the table alias as the table name rather than the actual table name. Remember that the aliases in this template are p for the provider table, and c for the consumer table. For internal reasons, you must use lowercase p and c as the alias name.

  • Provider table: The only choice is cleanroom_tut_db.cleanroom_tut_sch.demo_provider_table.

  • Consumer table: The only choice is cleanroom_tut_db.cleanroom_tut_sch.demo_consumer_table.

  • consumer_join_col: Let’s choose age_band. The fully qualified column name in the consumer table is c.age_band.

  • provider_join_col: We need to join on similar columns, so the fully qualified name is p.age_band.

  • group_by_col: Take your pick of provider or consumer columns from the remaining projectable columns. We will use p.device_type but you can choose any of the other provider or consumer columns returned by consumer.view_provider_column_policy.

These values are passed into consumer.run_analysis as shown here:

CALL samooha_by_snowflake_local_db.consumer.run_analysis(
  $cleanroom_name,
  $template_name,
  ['cleanroom_tut_db.cleanroom_tut_sch.demo_consumer_table'],
  ['cleanroom_tut_db.cleanroom_tut_sch.demo_provider_table'],
  OBJECT_CONSTRUCT(
    'consumer_join_col','c.age_band',
    'provider_join_col','p.age_band',
    'group_by_col','p.device_type'
  ),
  FALSE
);
Copy

Tip

The last parameter indicates that the results should not be cached. We recommend not caching results during testing, which forces the API to re-run the query even if you pass in the same query. This is useful when changing the underlying template without changing the query.

Congratulations! You should see the template results in Snowsight.

Additional features not covered here allow you to export those results directly to your own Snowflake account, or to an approved third-party service in a process called Activation.

See more use cases and learn about more clean room features in the Snowflake Clean Rooms developer guide.

Both accounts: Clean up

Now let’s clean up all the resources you created.

Note that you’ll need to use the same role to delete your source tables that you used to create them.

Provider cleanup

Run this code in your provider account:

USE ROLE samooha_app_role;
CALL samooha_by_snowflake_local_db.provider.drop_cleanroom($cleanroom_name);

USE role ACCOUNTADMIN;
DROP TABLE cleanroom_tut_db.cleanroom_tut_sch.demo_provider_table;
DROP DATABASE cleanroom_tut_db;
Copy

Consumer cleanup

Run this code in your consumer account:

USE ROLE samooha_app_role;
CALL samooha_by_snowflake_local_db.consumer.uninstall_cleanroom($cleanroom_name);

USE ROLE ACCOUNTADMIN;
DROP VIEW cleanroom_tut_db.cleanroom_tut_sch.demo_consumer_table;
DROP DATABASE cleanroom_tut_db;
Copy