Tutorial: Get started with collaboration clean rooms (API)¶
Introduction¶
This tutorial is aimed at developers who want to create and use collaboration clean rooms using the Snowflake Data Clean Rooms API. You will work through a two-account scenario where two collaborators share data, register templates, and run analyses.
What you will learn¶
This tutorial shows you how to:
- Register data offerings and templates in the clean rooms registry.
- Create a collaboration using a YAML collaboration specification.
- Join a collaboration from a second account.
- Link templates and data offerings to an existing collaboration.
- Run analyses as different collaborators with different permissions.
Requirements to run this tutorial¶
- Two Snowflake accounts, Enterprise Edition or higher, each with the Snowflake Data Clean Rooms environment installed. If clean rooms isn’t installed, see Installing the Snowflake Data Clean Rooms environment.
- The SAMOOHA_APP_ROLE must be granted to the user in each account.
Note
This tutorial requires two separate Snowflake accounts. You will run Alice’s steps in one account and Bob’s steps in the other. Each section heading indicates which account to use.
The tutorial includes code snippets with <placeholders> that you should replace with the appropriate values.
Collaboration basics¶
A collaboration clean room allows multiple parties to share and analyze data securely without exposing raw data to each other. Collaborations are defined by a YAML specification that lists the collaborators, their data, and what each party can do.
Key concepts used in this tutorial:
-
Collaboration specification: A YAML document that defines the collaborators, their aliases, roles, data offerings, and templates.
-
Collaboration roles: Each collaborator is assigned one or more roles:
-
Owner: Creates and manages the collaboration. There is exactly one owner per collaboration.
-
Data provider: Contributes data offerings that other collaborators can use in analyses.
-
Analysis runner: Runs templates against the shared data. Each analysis runner has a list of data providers and templates available to use.
In this tutorial:
- Alice is the collaboration owner, a data provider, and an analysis runner.
- Bob is a data provider and an analysis runner.
-
-
Data offering: A table linked into the collaboration by a data provider.
-
Template: A registered JinjaSQL query that analysis runners execute against a data offering.
In this tutorial, Alice and Bob each contribute one data offering. Alice registers a template that only Alice can run. Bob registers a template that both Alice and Bob can run. Both templates join the two collaborators’ data on a shared column.
Alice: Register resources¶
Run the following steps in Alice’s account to set up the session environment and create sample data:
Note
In real-world usage, we recommend assigning more fine-grained privileges to your users instead of using the top-level SAMOOHA_APP_ROLE role.
Next, you will register a data offering and a template to use in the collaboration.
Register a data offering¶
A data offering is a registered dataset with column-level policies that control how collaborators can use the data. You will create a data offering from the sample data you created.
Register Alice’s data offering so that it can be included in the collaboration specification. Data offerings are defined by a YAML data offering specification.
The data-offering specification defines the following properties:
datasets: A list of tables or views to share.alias: A short name used to reference this dataset within this spec and by templates.allowed_analyses: Restricts usage to templates only (no free-form SQL).schema_and_template_policies: Defines the format, naming, and availability of columns from this data source. The data offering exposes only two columns from your source table:hashed_email(which must be used as a join column) andstatus.
Save the data offering ID from the response. You will need it when creating the collaboration specification.
Register a template¶
A template is a JinjaSQL query that analysis runners execute in the collaboration. Register a template that joins two tables on the hashed email column and counts matches grouped by status:
The template uses two tables source_table[0] and source_table[1]. These are data offerings present in the collaboration. The analysis runner passes in the names of the tables to use when they run the analysis.
Note
This tutorial uses T and T1 as table aliases for simplicity. In production templates, you should use the
standard aliases p, p1, p2, and so on, which are required for
Snowflake Data Clean Room policy enforcement.
Save the template ID from the response:
Alice: Create the collaboration¶
Continue in Alice’s account.
Now that the resources are registered, Alice creates the collaboration. The collaboration is defined by a YAML specification that lists the collaborators, their roles, data offerings, and templates.
Call INITIALIZE with the collaboration specification. Review the YAML carefully before running it:
Understanding the collaboration specification¶
The collaboration specification uses the aliases defined in the collaborator_identifier_aliases section to refer to all collaborators.
The collaboration defines the following roles and relationships:
-
analysis_runnerslists the collaborators that can run analyses in this collaboration. Only collaborators listed at the top level here can run analyses. The list of analysis runners cannot be modified after the collaboration is created. -
Each analysis runner entry has the following elements:
data_providers: Only these data providers can supply data to this analysis runner. This list cannot be modified later.data_offerings: Only these data offerings from the listed data providers can supply data to this analysis runner. The data offerings list can be updated after a collaboration is created.
-
Only the templates listed for an analysis runner can be used. The template list can be modified later. Notice that this collaboration currently shares the
alice_only_templatewith Alice.
Wait for the collaboration to be created and joined¶
INITIALIZE creates and joins the owner to the collaboration. This process is asynchronous.
Call GET_STATUS until Alice’s status is JOINED:
Enable template auto-approval¶
Whenever a collaborator asks to share a template with you in a collaboration, all designated recipients must approve the request before the template is shared. This tutorial doesn’t cover the approval flow, so run the following code to automatically approve all requests:
Bob: Join the collaboration¶
Switch to Bob’s account and run the following steps.
Set up the session environment:
View the collaboration invitation, then join it:
Joining is asynchronous. Call GET_STATUS until Bob’s status is JOINED:
Bob: Link a template¶
Continue in Bob’s account.
Bob creates and registers a template. This template joins two tables on the hashed email column, and returns a cross-tabulation of statuses from both tables. Templates are written in JinjaSQL; a template specification is written in YAML, which contains the embedded JinjaSQL template.
Now request to link the template to the collaboration, and share it with both Alice and Bob. (Notice how you must explicitly request to share a template with yourself; templates are not automatically shared with the account that registered them.)
Because Alice enabled template auto-approval, Alice automatically approves the request:
The template should be approved and added shortly.
Bob: Link a data offering¶
Continue in Bob’s account.
Create sample data and register a data offering:
The collaboration specification lists bob as a potential data provider for both alice and bob. Link the data offering into the
collaboration, and share it with both alice and bob:
You don’t need approvals when you share data with other collaborators.
Alice: Run an analysis¶
Switch back to Alice’s account.
Alice runs the alice_only_template, which is available only to Alice. The template joins Alice’s data offering
with Bob’s data offering on the hashed email column and groups results by status.
First, view the available data offerings and templates:
Now, run the analysis. Replace the placeholders with actual values. Replace the source_tables names with view names from
VIEW_DATA_OFFERINGS.
The results show the count of matching records grouped by status from Alice’s data.
Bob: Run an analysis¶
Switch to Bob’s account.
Bob runs the bob_shared_template, which is available to both collaborators. This template cross-tabulates the statuses from
both tables.
Replace the source_tables placeholders with actual view names from VIEW_DATA_OFFERINGS.
The results show a cross-tabulation of statuses from both data offerings, with the count of matching records for each combination.
Bob cannot run alice_only_template because Alice did not include Bob as a permitted user
for that template in the collaboration specification. Try running it to see what happens.
Alice: Clean up resources¶
Switch to Alice’s account to clean up all the resources used.
Tearing down a collaboration is a multi-step process. Call TEARDOWN, wait for the status to reach LOCAL_DROP_PENDING,
and then call TEARDOWN again:
Switch to Bob’s account and clean up the sample database:
Note
Tearing down the collaboration doesn’t delete registered templates or data offerings from either account’s registry.
Summary¶
In this tutorial, you learned how to:
- Register templates and data offerings in the clean rooms registry.
- Create a collaboration with a YAML specification that defines collaborators, roles, data, and templates.
- Join a collaboration from a second account.
- Link templates and data offerings to an existing collaboration using update requests.
- Run analyses as different collaborators with different levels of access.
Next steps¶
- You can find tutorials, sample worksheets, and video walkthroughs in Sample Worksheets and Videos.
- Learn more about collaboration specifications in the Overview of Snowflake Data Clean Rooms.
- Explore the full Snowflake Data Clean Rooms Collaboration API.
- Learn about collaboration roles and how to manage access.
- Learn how to activate query results to export data to a collaborator’s account.