Tutorial: Get started with Snowflake Data Clean Rooms in code¶
Introduction¶
This tutorial is aimed at developers who will create or use Snowflake Data Clean Rooms in code. This tutorial uses SQL code, but you can adapt the information shown here to create and use clean rooms in any coding language supported by Snowflake.
What you will learn¶
This tutorial shows you how to create and share a basic template in a clean room using the Snowflake Data Clean Room API. It also shows you how to run an analysis using the API in a clean room shared with you.
This tutorial creates a clean room with one table provided by the provider, one table provided by the consumer, and a template defined by the provider that defines a very simple JOIN query on the two tables.
Requirements¶
-
You should have a basic understanding of Snowflake and you should also read About Snowflake Data Clean Rooms before starting this tutorial.
-
You must have access to a Snowflake account, Enterprise Edition or higher, with the Snowflake Data Clean Rooms native app and API installed. If you don’t have the clean rooms app installed, you can either install it yourself, or else ask a Snowflake administrator to install it for you.
-
You must be granted the SAMOOHA_APP_ROLE to use the clean rooms API.
-
This tutorial uses a sample table named CUSTOMERS_2 that is installed with the clean rooms environment. Search your account for the table SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS_2 using the following command:
If the response has no rows, then you, or someone with ACCOUNTADMIN role, must run the following command to install the sample table:
This tutorial uses the same account to act as both a provider and a consumer in a clean room. This scenario is supported only for testing purposes and has limitations on what features it supports, compared to using separate accounts. In the real world, providers and consumers use different accounts, and for more advanced testing you might need to use separate accounts.
You can download this tutorial as a worksheet file to run in your Snowflake account.
Provider: Overview¶
Here is a summary of the steps that you’ll take to create a clean room as the provider:
- Create test data to share in your clean room.
- Create your clean room.
- Set join permissions on your data to specify which columns can be joined on in consumer queries.
- Create a template for your clean room. A clean room template is written in JinjaSQL and it evaluates to a SQL query at run time. Most templates include variables that allow collaborators to specify table and column names, WHERE clause conditions, and more, at run time. A clean room collaborator chooses and runs a template in a clean room.
- Specify the default version of the clean room.
- Add consumers who can access your clean room. In this tutorial consumers must be Snowflake users with accounts approved by your clean room administrator.
- Publish the clean room to make it available to your invited consumers.
Note
The term collaborator is used above for templates because, depending on how the clean room is configured, both providers and consumers can create or run templates. This tutorial shows only how to enable consumer-run templates.
Provider: Create the clean room¶
Sign in to Snowsight as a user granted the SAMOOHA_APP_ROLE role. If you don’t have that role, ask your account administrator to grant it to you.
Create a new SQL worksheet in Snowsight to hold your clean room code. Name the worksheet “API tutorial - Provider”.
The following snippet creates a clean room that is accessible only within the organization (so it’s marked as INTERNAL). To share a clean room outside of an organization requires additional steps that won’t be covered in this tutorial. When sharing a clean room with yourself, it must be INTERNAL, of course.
You must use the SAMOOHA_APP_ROLE for most clean room procedures.
Provider: Bring data into the clean room¶
Next, bring your test data into the clean room. There are two steps to bring data into a clean room:
- Register the data.
- Link (import) the data into the clean room.
Register the data¶
The first step in importing data is to register the database, schema, or object in the clean room account. Registering is to grant clean rooms the right to read and use the source data. You can register an entire database, a schema, a table, or a view.
You are using sample data installed with the clean room, which is pre-registered for you, so there’s no need to register the sample data in this tutorial.
Link the data into the clean room¶
Importing data into a clean room is called linking. Both providers and consumers can link their data into a clean room. The generic term for a view or table linked into a clean room is a dataset.
When you link data, the clean room creates a read-only view linked to your source data. This clean room view is a secure, encrypted view inside the clean room, accessible only to templates within the clean room. Your template accesses this secure view, not the source data, although the original source name is used whenever you need to reference the data.
Unlike registering, linking is done at the individual table or view level. You can link multiple items in one call.
Link clean room sample data into the clean room:
Note
If a table linked into a clean room is deleted, renamed, moved, or has restrictive permissions added, the table can’t be used in the clean the clean room unless you restore the old table with the same location, name, and permissions.
Provider: Set join policies on the data¶
Both providers and consumers can specify join policies on their own data. A clean room join policy specifies which columns in a table can be joined on by your collaborators’ queries in that clean room. This provides an extra level of control over how others can use your data in the clean room. Your own policies are not enforced on your own queries – that is, join policies on your own data are ignored when you run a query; your policies are enforced only on queries run by other users.
Clean room join policies are set on the table, and apply to all clean rooms where the table is used. Any columns not listed here cannot be joined using INNER JOIN or OUTER JOIN conditions in the clean room if the template explicitly checks join policies.
Note that clean room join policies are not the same as Snowflake join policies; clean room policies specify which columns can be joined on; Snowflake join policies specify which columns can’t be joined on.
Tip
Snowflake policies set on the source table are retained in the linked clean room table, but aren’t reported to
collaborators. That is, Snowflake join policies are enforced but are not reported by consumer.view_provider_join_policy, which
reports only the provider’s clean room join policies. Therefore you should let your collaborators know about any Snowflake policies
that you have set on your data.
Specify joinable columns for a table using the format
database_name.schema_name.table_or_view_name:column_name for each column. The following example allows three columns
of provider data to be joinable:
Provider: Add your template¶
A clean room template is a JinjaSQL template that evaluates to a SELECT query. This query has access to all datasets linked into the clean room, subject to join and column policies.
This tutorial won’t cover the details of designing a JinjaSQL template, but here is the SQL query that you’re trying to implement:
The query simply joins one provider and one consumer table on a specified join column, groups by a specified grouping column, and projects the group value and count of each group. This is the query that will run in the clean room when the user runs the template.
Here is the JinjaSQL template for the same query, with variables added where the consumer can specify tables or columns. After the consumer specifies the variables, it will evaluate to a SQL query similar to the one above, but with the table and column names provided by the consumer.
A few notes on the template:
- Content surrounded by {{brackets}} are named variables passed in by the consumer when they run the template. The following variables are
passed in by the consumer:
group_by_col,consumer_join_col,provider_join_col - The
my_tableandsource_tablearrays are global variables created by the system, populated with consumer and provider table names passed in by the caller. These tables must be linked into the clean room by the consumer and provider. - All provider tables must be aliased as
pin the query. All consumer tables must be aliased asc. If you use multiple tables, alias them with a 1-based suffix, so:p,p1,p2,p3and so on for provider tables, andc,c1,c2,c3and so on for consumer table aliases. (pandp0are equivalent.) - Snowflake Data Clean Rooms supports some custom JinjaSQL filters that act on variables. The
column_policyandrow_policyfilters verify that the columns they are applied to conform to the column and row policies in that clean room, or else the request to run the template will fail. So{{ consumer_join_col | join_policy }}verifies that the value passed in toconsumer_join_colconforms to the join policies set by the provider and consumer in this clean room. - Variables used as identifiers must be processed by the IDENTIFIER function before they can be used in SQL.
Add the template to the clean room:
Provider: Set column policies¶
Each party in the clean room can limit which columns the other parties can project by setting a column_policy. A column policy in a clean room lists all the columns of your data that can be projected; no other columns can be projected. If you do not specify a column policy for your data, all your data can be projected.
A column policy is tied to a specific table and template in a clean room. You can allow different columns to be projected in different templates. The same column cannot be in both a join and a column policy.
Note that column and join policies are enforced only if the template uses the column_policy and row_policy filters in the template.
Here is how to allow projection of three columns of your data in the template you just created. Column syntax is
template_name:table_name:column_name
Provider: Add a release directive¶
Every clean room has a version number, consisting of major, minor, and patch values. You must specify which version of the clean room is served to consumers: this is called the default release directive.
This is the first version, so the version number is 1.0.0.
Snowflake creates a new version of the clean room each time you upload code into the clean room. If you want users to get the latest version, you’ll need to set a new default release directive with the newest version number. You won’t be uploading code, so you won’t need to call this again for this tutorial.
Provider: Designate consumers¶
Now you will specify who has access to your clean room as a consumer. For this tutorial you will add yourself as a consumer. Doing so marks the clean room as an internal testing clean room, used only for testing, which limits some of its functionality, but it will support all the features needed for this tutorial.
The procedure needs two arguments to identify each consumer:
-
The consumer’s account locator. Get your account locator like this:
-
The consumer’s consumer data sharing account ID, in the format
org_name.account_name. Get your consumer data sharing account ID in the proper format like this:
Now share the clean room with yourself as a consumer, adding your account locator and consumer data sharing account ID where indicated:
Provider: Publish the clean room¶
Finally, publish the clean room. This makes the clean room available to the consumer you added above. The procedure takes a minute or more to complete.
When the procedure finishes, you should see the clean room listed in the clean rooms UI, in the Created tab in your provider account, and in the Invited tab in the consumer account, with the label “Powered by Dev Edition.” The consumer account will receive an invitation email. (Do not install the clean room from the Invited tab; you will install it in code, in a later step.)
Congratulations: You’ve published your first clean room!
Now take off your provider cap and put on your consumer cap.
Consumer: Install (join) the clean room¶
You’ll use the same account for the provider and consumer roles in this tutorial, so add a new SQL worksheet named “API Tutorial - Consumer” in Snowsight in the same account.
Set up the session environment, similar to the way you did for the provider:
Next, install the clean room that you published and shared as a provider. To install a clean room, you must specify both the clean room
name and the account locator of the provider who shared the clean room with
you. Specifying the clean room name and the account locator helps disambiguate clean rooms with identical names. Run
SELECT CURRENT_ACCOUNT(); to get your provider locator.
Installation can take a few minutes.
Consumer: Link your data¶
You must register and link your data into the clean room, just as you did as a provider. Again, because you are using sample data provided with the clean room installation, you the data is pre-registered.
You will use a different sample table installed with clean rooms. If this table is not present in your account, see the Requirements section on the Introduction page to learn how to install it.
Consumer: No need to set policies¶
You could set policies on your data, the same way the provider did, but this template is approved for only the consumer to run, so there’s no need to set any policies on it.
However, if you were to approve a provider request to run this template, you should first set join and column policies on your data to control what the provider could do with it.
Consumer: Run the analysis¶
To run a query, you need the following information:
- The name of the template you want to run.
- The names of your tables to use in the template.
- The names of the provider’s tables to use in the template.
- Any other name/value variables to pass in.
Examine the template¶
You can examine the template to see what it does and any arguments that it accepts. The following example shows how to list the templates in the clean room, see a template’s code, and see what arguments it accepts:
You can see that you need to pass in a provider table and column name, a consumer table and column name, and a grouping column.
List the available provider tables¶
See which tables the provider has added to the clean room.
List the provider’s joinable and projectable columns¶
See which columns can be joined on or projected from the provider’s data.
Run the analysis¶
Now that we know what the query needs, what provider data is available, and what can be done with that data, you can select values to pass in.
You must fully qualify all column names in most circumstances. You must use the table alias as the
table name rather than the actual table name. Remember that the table aliases in this template are p for the provider table, and c
for the consumer table. You must use lowercase p and c.
In your first query, use the following values:
- Provider table: The only choice is
SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS. - Consumer table: The only choice is
SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS_2. consumer_join_col: Useage_bandfrom the consumer table; the fully qualified column name isc.age_band.provider_join_col: You need to join on similar columns, so the equivalent, fully qualified provider name isp.age_band.group_by_col: Take your pick of provider or consumer columns from the remaining projectable columns. Tryp.device_type, but you can use any of the other provider or consumer columns returned byconsumer.view_provider_column_policy.
These values are passed into consumer.run_analysis as shown in the following example:
Congratulations! You should see the query results in Snowsight.
Additional features not covered here allow you to export those results directly to your own Snowflake account, or to an approved third-party service in a process called Activation.
See more use cases and learn about more clean room features in the Snowflake Clean Rooms developer guide.
Both accounts: Clean up¶
Now let’s clean up all the resources that you created.
Provider cleanup¶
Run the following code in your provider worksheet:
Consumer cleanup¶
Run the following code in your consumer worksheet: