Use Snowflake Collaboration Data Clean Rooms¶

This topic provides a high-level guide to using Collaboration Data Clean Rooms. It also provides details on all the key steps needed to create or participate in a collaboration.

Basic clean room collaboration workflow¶

Here is a simple clean room collaboration scenario:

  1. The collaboration owner optionally registers any templates or data offerings that they want to appear in the initial configuration of the collaboration.

  2. The owner optionally asks any intended collaborators to register any templates or data offerings that they want to appear in the initial configuration of the collaboration. Collaborators then give the resource IDs of any items that they registered.

  3. The owner then creates a collaboration. The collaboration is defined by a collaboration YAML spec that lists the collaborators, their roles, and all resources that should be present in the initial version of the collaboration.

    • When a collaboration is created, the set of collaborators and their roles is fixed: only collaborators with a role in the collaboration definition are invited to join. Similarly, the set of analysis runners is fixed. However, any collaborator can also become a data provider by linking new data into the collaboration.

    • If your collaboration includes users in other cloud hosting regions, they must enable Cross-Cloud Auto-Fulfillment on their account before they can review and join the collaboration.

  4. The owner joins the collaboration that they created, which makes the collaboration active. The collaboration is now visible and joinable by any collaborator in the spec.

  5. Collaborators review and join the collaboration.

  6. Collaborators can then optionally add resources to the collaboration, such as templates, and if they are a data provider, data offerings.

  7. Analysis runners can then run any templates assigned to them in the collaboration, using any data available to them in the collaboration (and optionally unshared local data). The analysis runner bears the cost of the analysis. Templates can either return query results in the response or activate results to the caller or another collaborator.

The following sections describe the details of each of these steps.

Create a collaboration¶

To create a collaboration, you design a collaboration spec that defines all the collaborators and their roles. The collaboration owner optionally registers and links any other resources that they want to make available in the initial collaboration, and includes the resources in the collaboration spec. If the owner expects to use resources from collaborators, the owner can also prompt those users to register their resources and give the owner the resource IDs to include in the collaboration spec.

The owner then calls INITIALIZE to begin creating the collaboration. This is an asynchronous process, so they must call GET_STATUS until the status is CREATED.

The owner must then join their collaboration by calling JOIN. Joining is also an asynchronous process, so the owner must call GET_STATUS to track the join progress. When the collaboration status is JOINED, the collaboration will be active, and all collaborators can see and join the collaboration. Until the owner joins the collaboration, no collaborators can see or join the collaboration.

CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.INITIALIZE(
$$
api_version: 2.0.0
spec_type: collaboration
name: my_first_collaboration
owner: alice
collaborator_identifier_aliases:
  alice: example_com.acct_abc
  bob: another_example.acct_xyz
analysis_runners:
  bob:
    data_providers:
      alice:
        data_offerings: []
      bob:
        data_offerings: []
  alice:
    data_providers:
      alice:
        data_offerings: []
      bob:
        data_offerings: []
    templates: []
$$
);
SET collaboration_id = '<collaboration_id>';

-- Creating a collaboration is asynchronous. Check to see when collaboration is CREATED.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.GET_STATUS($collaboration_id);

-- Collaboration is visible here when it's created.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.VIEW_COLLABORATIONS();

-- Join the collaboration you created.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.JOIN($collaboration_id);

-- Joining a collaboration is asynchronous. Call get_status until the status is JOINED.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.GET_STATUS($collaboration_id);
Copy

Add resources to a collaboration¶

Any collaborator can add resources to a collaboration, or remove resources that they have added to the collaboration.

Resources can be added to a collaboration either before or after a collaboration is created.

Resources support versioning; however, creating a new resource with a new version doesn’t remove the previous version from the collaboration.

Resources are uniquely named by combining the user-provided name and version (and alias, for data offerings).

You can add the following resources to a collaboration:

Templates¶

Templates are JinjaSQL clean room templates that can be run by specified collaborators. Any collaborator can register and add a template to a collaboration, as long as all affected collaborators approve the request, as described in the following steps. You can add or remove only templates that your account has registered.

To add a template to a collaboration:

  1. Design a template for the collaboration and embed it in a template specification.

  2. Call REGISTRY.REGISTER_TEMPLATE, which returns a template ID.

  3. Add a template. The process depends on whether the collaboration already exists:

    • To add a template before the collaboration is created, give the template ID to the collaboration owner, who adds it to the collaboration spec, defining who can run it.

       alice:
         data_providers:
           bob:
             data_offerings: []
         templates:
         - id: bob_template_v1 # Alice can run this template, seemingly registered by bob.
      
      Copy
    • To add a template to an existing collaboration, you must request permission from all collaborators affected by the template. Take the following steps to add a template to an existing collaboration:

      1. Call REGISTER_TEMPLATE to register the template in your account, which makes it available to add to collaborations.

      2. Call ADD_TEMPLATE_REQUEST with the template ID to start the approval flow to add the template to a specific collaboration, for specific users.

        All collaborators affected by the resource see the request when they call VIEW_UPDATE_REQUESTS.

      3. Collaborators who see the request with status PENDING should call APPROVE_UPDATE_REQUEST or REJECT_UPDATE_REQUEST.

        • If any collaborator rejects the request, the update request is rejected.

        • Collaborators can’t later change an approval to a rejection, or a rejection to an approval.

When the request status is APPROVED, the template is available to the users specified in the add template request. If the request is REJECTED, any reason supplied by the rejecting party is visible in the request report.

Tip

To see which templates you have registered, call REGISTRY.VIEW_REGISTERED_TEMPLATES.

Template design for a collaboration¶

Collaboration templates are the same as clean room templates, with a few special considerations:

  • Shared tables listed in the collaboration are used to populate the template’s source_table variable.

  • my_table is used only if an analysis runner wants to use local, unshared data. If you use the my_table variable in a template, be aware that tables assigned to that variable aren’t shared with the collaboration.

  • Columns from the data sources might have new names when exposed to the template or user. See Source column renaming to learn how and when source columns are renamed. Templates and user-provided arguments (such as a join column name) should use the final name, not the original name, if the column is renamed.

  • Activation templates in a collaboration don’t need to be named activation_template_name. All other activation template requirements still apply.

For information about custom template syntax in Snowflake Data Clean Rooms, see Custom clean room template reference.

Data offerings¶

A data offering is a set of one or more data views shared with specific analysis runners in a collaboration. Data offerings can be added by any data providers listed in a collaboration. Data offerings are exposed in a scoped format as data offering ID.alias, where the alias is a specific view in the data offering. You can share data offerings with a given collaborator only if you are listed as a data provider for that analysis runner in the collaboration specification.

A data offering is a live view of the data, not a snapshot of the data at the time the data offering is created or registered. Any Snowflake policies applied to the source data are active in the collaboration.

When you register a data offering, Snowflake creates a view for each data source listed in the data offering specification. The view includes only the columns listed in the data offering specification. When you link a data offering into a collaboration, Snowflake creates a copy of that view, access protected to any analysis runners who can access that data offering, according to the collaboration specification. If you move, rename, or change access permissions to the underlying tables, the data offering will become unusable through any previously registered links.

If you are using Snowflake Standard Edition, you can’t share data offerings with other collaborators, but you can use your own data in a query.

Note

You must have OWNERSHIP on any data that you want to share. If you don’t, you receive a “missing reference usage grant” error when you try to join the collaboration. Learn how to handle this issue.

You can remove data resources from a collaboration by calling unlink_data_offering.

Data offerings are added to a collaboration with the following steps:

  1. The data owner registers the data offering by calling REGISTRY.REGISTER_DATA_OFFERING, which returns a data offering ID.

    This step makes the data offering available to be linked into any collaboration that you can access. You can use the same data offering ID to share a data offering with multiple collaborations.

  2. The next step depends on whether or not the collaboration has been created:

    • If the collaboration hasn’t been created yet, the data provider gives the data offering ID to the collaboration creator to add to the collaboration definition. When a data offering is added to the collaboration definition, the data offering will be visible to anyone in the collaboration after the data provider joins the collaboration.

    • If the collaboration has been created, the data provider joins the collaboration and calls COLLABORATION.LINK_DATA_OFFERING with the data offering ID, the collaboration name, and who the data can be shared with.

Each data offering represents one or more tables or views. Individual tables are accessed using the syntax collaborator alias.data offering ID.dataset alias, where the data offering ID is a combination of the user-provided name and version values, and the alias is a single table in the offering. Consider the name, version, and alias as a scoping system when registering your data offerings.

For example, you might register the following data offering of sales data, where each table is specific to a US state:

api_version: 2.0.0
spec_type: data_offering
version: v0
name: examplecorp_sales_by_state
datasets:
 - alias: AL
   data_object_fqn: mydb.mysch.al_data
 - alias: NY
   data_object_fqn: mydb.mysch.ny_data
 - alias: CA
   data_object_fqn: mydb.mysch.ca_data
Copy

The analysis runner would then reference these tables as data offering id.AL, data offering id.NY, or data offering id.CA.

Data offerings aren’t visible in a collaboration until the user who registered the data offering joins the collaboration.

Tip

If you don’t have OWNERSHIP on data that you share, you will get an error when you try to join the collaboration or link your data offering. The error message gives information about a SQL command that an ACCOUNTADMIN must run to grant data access to the collaboration. After the ACCOUNTADMIN runs the command, you will be able to join the collaboration. See more information.

When running a query, analysis runners pass the data offerings by ID to the source_tables parameter of COLLABORATION.RUN.

To see your registered data offerings, call VIEW_REGISTERED_DATA_OFFERINGS.

Apply usage policies to your data¶

There are two ways to apply a Snowflake column policy, such as a join or aggregation policy, to your shared data:

  • Apply the policy on the source data. Any policies applied to the source data are enforced in the datasets exposed in a collaboration. Ensure that you communicate your policy to the user.

  • Apply the policy on the data offering when used in free-form queries. If you allow free-form queries on your data offerings, you can specify column policies to enforce on free-form queries on your data. Specify column policies for free-form queries in the data offering description. These policies are applied on top of any existing Snowflake policies on your source tables.

Apply the policy to your source data¶

Any Snowflake policies applied to the source data also apply to the data offering view in the collaboration.

If you apply Snowflake policies to your source data, be sure to let your collaborators know about them, so that they don’t unknowingly try to run a query that joins on a non-joinable column or doesn’t group when it should. You might mention any Snowflake policies in your data offering’s description field.

Apply the policy on the data offering (free-form query usage only)¶

You can apply Snowflake policies to free-form queries in your shared data without applying them to the source data. These policies are applied to your data when accessed using free-form queries in addition to any Snowflake policies applied directly to the source table.

To add free-form SQL policies to your data:

  1. Create a policy type supported by Collaboration Data Clean Rooms.

  2. Add the following information to your data offerings definition:

    • Set allowed_analyses: template_and_freeform_sql.

    • Add a freeform_sql_policies section to the dataset definition in your data offering description.

    • Add the appropriate policy type sections under freeform_sql_policies, listing the Snowflake policies that you created, and which collaboration columns to apply them to.

Collaborators see policy types applied to your data when they call COLLABORATION.VIEW_DATA_OFFERINGS.

You can reuse a policy on multiple columns on multiple tables.

Example:

CREATE OR REPLACE AGGREGATION POLICY my_db.public.my_agg_policy AS ()
  RETURNS AGGREGATION_CONSTRAINT ->
    AGGREGATION_CONSTRAINT(MIN_GROUP_SIZE => 5);
Copy

Source column renaming¶

Column names exposed to the template or free-form SQL caller are determined by the category and column_type values that describe the column in its data offering definition. Column renaming follows this rubric:

  • If category for the column is join_custom or passthrough, the original column name is exposed.

  • If category is join_standard, then the column is renamed as the column_type value.

  • If category is timestamp, the column is renamed timestamp in the data offering.

For example, if the column in the source table is named user_email_address, how this column is exposed to the template or free-form SQL depends on how it is defined in the data offering definition:

  • If the column category is join_standard and column_type is present:

    ... Snippet from data offering yaml ...
    schema_and_template_policies:
         user_email_address:
           category: join_standard
           column_type: hashed_email_sha256
    
    Copy

    Then the column_type value is used in queries and templates:

    SELECT HASHED_EMAIL_SHA256 FROM source_table[0];
    
    Copy
  • If the column category is join_custom:

    ... Snippet from data offering yaml ...
    schema_and_template_policies:
         user_email_address:
           category: join_custom
           column_type: hashed_email_sha256
    
    Copy

    Then the original source column name is used in queries and templates:

    -- column_type is ignored for join_custom columns.
    SELECT user_email_address FROM source_table[0];
    
    Copy

Join a collaboration¶

You must join a collaboration for any resources that you contributed to a collaboration to be available in the collaboration, or to be able to run an analysis in the collaboration.

  • The creator calls only COLLABORATION.JOIN.

  • Non-creators call COLLABORATION.REVIEW, and then COLLABORATION.JOIN.

    Important

    If your account is on a different cloud hosting region than the collaboration owner’s:

Joining is an asynchronous process; call GET_STATUS to see when your status is listed as JOINED.

Run an analysis¶

You can run analyses either by running a template in the query or by running a free-form SQL query on the collaboration data. You must be a designated analysis runner in a collaboration to be able to run an analysis. The collaboration specification determines whether you can run a template, activate results, or run free-form SQL queries. Your capabilities, as well as the data and templates available for you to use, are described in the collaboration specification.

The analysis runner bears the cost of running an analysis.

Run an analysis from a template¶

To run an analysis from a template, view the list of templates that you can run, view the list of data offerings that you can use, then call COLLABORATION.RUN with your values either as individual parameters or as an analysis specification in YAML format:

-- See which data offerings are available.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.VIEW_DATA_OFFERINGS($collaboration_id);

-- See which templates you can run.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.VIEW_TEMPLATES($collaboration_id);

-- Pass in the arguments in analysis YAML format.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.RUN(
  $collaboration_id,
  $$
    api_version: 2.0.0
    spec_type: analysis
    name: My_analysis
    description: Sales results Q2 2025
    template: 12345

    template_configuration:
      view_mappings:
        source_tables:
          - {{table_1_name}}
          - {{table_2_name}}
      arguments:
         conv_purchase_id: PURCHASE_ID
         conv_purchase_amount: PURCHASE_AMOUNT
         publisher_impression_id: IMPRESSION_ID
         publisher_campaign_name: CAMPAIGN_NAME
         publisher_device_type: DEVICE_TYPE
  $$ );
Copy

Enable and run free-form SQL queries on your data¶

A data provider can enable analysis runners to run SQL queries against their collaboration data offerings. You must be a member of a collaboration, and be granted the analysis runner role with free-form SQL permission on a data offering to be able to run free-form SQL queries against that data.

Data provider steps¶

To enable collaborators to query a dataset from the command line, set allowed_analyses: template_and_freeform_sql in your dataset description in the data offering description. Users who join the collaboration can run free-form SQL queries on the datasets that they can access.

The following YAML defines a dataset that allows free-form queries:

api_version: 2.0.0
version: 1
name: my_favorite_dataset
datasets:
  - alias: test_freeform_restricted_agg
    data_object_fqn: samooha_provider_sample_database.audience_overlap.customers
    object_class: custom
    allowed_analyses: template_and_freeform_sql
...
Copy

Analysis runner steps¶

  1. To see which datasets support free-form queries, after joining a collaboration, the analysis runner runs COLLABORATION.VIEW_DATA_OFFERINGS. The FREEFORM_SQL_VIEW_NAME column in the results shows which tables can be accessed using free-form SQL, and the table name to use in the SQL query.

    CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.VIEW_DATA_OFFERINGS($collaboration_id);
    
    Copy
    +-------------------------------+------------------------+------------------------------+------------------------------+---------------------------------------+
    |   template_view_name          | template_join_columns  | analysis_allowed_columns     | activation_allowed_columns   |      freeform_sql_view_name           |
    +-------------------------------+------------------------+------------------------------+------------------------------+---------------------------------------+
    | useralias.data_offering_alias |     ip_address         | email, name, age             |             SSN              | SOMETESTVIEW1234.FREEFORM_SQL.dataset1|
    +-------------------------------+------------------------+------------------------------+------------------------------+---------------------------------------+
    
  2. The collaborator can then query the table listed in the FREEFORM_SQL_VIEW_NAME column using free-form SQL queries:

    SELECT * FROM SOMETESTVIEW1234.FREEFORM_SQL.dataset1;
    
    Copy

All policies applied to the table are enforced.

Run an analysis with your own data when you use Standard Edition¶

If you use Standard Edition, you can run an analysis in the standard way. However, you can’t add your data to the collaboration description and share it with other users.

To use your own data in a collaboration as a Standard Edition user:

  1. To register your data offering, call REGISTRY.REGISTER_DATA_OFFERING. You must specify column names.

  2. Call COLLABORATION.LINK_LOCAL_DATA_OFFERING.

    Only you can see your offering when you call COLLABORATION.VIEW_DATA_OFFERINGS; other collaborators won’t see your data source listed.

  3. Use the data offering ID when you call COLLABORATION.RUN, in either the local_template_view_names parameter, or the local_view_mappings.my_tables field if passing in an analysis YAML. local_template_view_names and local_view_mappings.my_tables populate the my_table parameter in the template.

The following example shows how to run a template using the YAML format version of the run procedure. This example includes the my_tables field, which is populated by calling LINK_LOCAL_DATA_OFFERING.

-- See what data offerings are available. Your own local data will be listed here as well.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.VIEW_DATA_OFFERINGS($collaboration_id);

-- Pass in the arguments in analysis YAML format.
CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.RUN(
  $collaboration_id,
  $$
    api_version: 2.0.0
    spec_type: analysis
    name: my_analysis
    description: Cross-purchase results for Q4 2025
    template: mytemplate_v1

    template_configuration:
      view_mappings:
        source_tables:
          - ADVERTISER1.ADVERTISER_DATA_V1.CUSTOMERS
          - PUBLISHER.ADVERTISER_DATA_V1.CUSTOMERS
      local_view_mappings:
        my_tables:
          - PARTNER.MY_DATA_V1.MY_CUSTOMERS # Populate my_table array with my own table.
      arguments:  # Template arguments, as name: value pairs
         conv_purchase_id: PURCHASE_ID
         conv_purchase_amount: PURCHASE_AMOUNT
         publisher_impression_id: IMPRESSION_ID
         publisher_campaign_name: CAMPAIGN_NAME
         publisher_device_type: DEVICE_TYPE
  $$ );
Copy

Activate query results¶

Note

If you aren’t using the SAMOOHA_APP_ROLE role — that is, you are using a role managed by access management procedures — you must have the REGISTER DATA OFFERING privilege to join any collaboration where you are an analysis runner and the collaboration specification includes an activation_destinations field.

To activate the results of a query:

  1. Ensure that all activation columns have the following properties set in the appropriate spec:

    The data offering specification for the table with the activated column, must set activation_allowed: TRUE for that column:

     api_version: 2.0.0
     spec_type: data_offering
     name: 2025_orders
     version: 2025_01_01_v1
     description: Activating Cleveland sales results for 2025
    
     datasets:
      - alias: customers
        data_object_fqn: db1.schema1.orders
        allowed_analyses: template_only
        object_class: custom
        schema_and_template_policies:
          email:
            category: join_standard
            column_type: hashed_email_sha256
            activation_allowed: TRUE
          purchase_amount:
            category: passthrough
            activation_allowed: TRUE
    
    Copy

    Note

    Any column used in the template with the activation_policy filter applied must have its activation_allowed value set to TRUE in the data offering specification. The following example shows a template with the activation policy applied to two columns supplied by the analysis runner:

    BEGIN
      CREATE OR REPLACE TABLE cleanroom.activation_data_analysis_results AS
        SELECT count(*) AS ITEM_COUNT, c.status, c.age_band
        FROM IDENTIFIER({{ my_table[0] }}) AS c
        JOIN IDENTIFIER({{ source_table[0] }}) AS p
        ON {{ c_join_col | sqlsafe | activation_policy }} = {{ p_join_col | sqlsafe | activation_policy }}
        GROUP BY c.status, c.age_band
        ORDER BY c.age_band;
      RETURN 'analysis_results';
    END;
    
    Copy
  2. The analysis runner calls RUN to run the analysis.

    • If activating to yourself, results are available immediately in the caller’s account in the table consumers_database.ACTIVATION_RESULTS.CONSUMER_DIRECT_ACTIVATION_SUMMARY. To learn how to view the query results, see the last step.

    • If activating to another collaborator:

      1. The collaborator calls VIEW_ACTIVATIONS until it returns a status of SHARED. Activating to another account can take considerable time for large result sets, as the data must be transferred to the collaborator’s account and decrypted.

      2. When the status of the activation is SHARED, the collaborator calls PROCESS_ACTIVATION to send the results to their account. The response to PROCESS_ACTIVATION includes the table and segment names. This sets the activation status to PROCESSED.

  3. To retrieve query results, run the following SQL command, providing your results table name and, optionally, a segment name to filter results:

    SELECT *
      FROM <results_table_name>
        [WHERE segment = <segment_name>];
    
    Copy

Leave or delete a collaboration¶

  • Non-owners leave a collaboration by calling COLLABORATION.LEAVE. Any data offerings they have provided will be removed from the collaboration. You can’t rejoin a collaboration after leaving it.

  • Collaboration owners can’t leave a collaboration; ownership can’t be transferred. A collaboration owner can drop a collaboration for all collaborators by calling COLLABORATION.TEARDOWN.

Both processes are asynchronous. You must call GET_STATUS to monitor the status, and call LEAVE or TEARDOWN again when GET_STATUS shows the status as LOCAL_DROP_PENDING.

Enable Cross-Cloud Auto-Fulfillment¶

If you aren’t in the same cloud host region as the collaboration owner, Cross-Cloud Auto-Fulfillment (LAF) must be enabled for your account for you to join the collaboration. If you try to review a collaboration in another cloud region and LAF is not enabled for your account, or you don’t have proper permissions, you receive an error when you call REVIEW on the collaboration.

Note

When Cross-Cloud Auto-Fulfillment is used in a collaboration:

  • Data is replicated into the account of each collaborator that can access that data.

  • Data is also replicated into the owner’s region, whether or not they can access the data offering. However, their ability to access the data is determined by the data offering’s sharing rules.

You can determine your own cloud host region by running SELECT CURRENT_REGION();

To enable Cross-Cloud Auto-Fulfillment in your account:

  1. An org admin must enable LAF on your account by calling SYSTEM$ENABLE_GLOBAL_DATA_SHARING_FOR_ACCOUNT. For more information, see Manage privileges for auto-fulfillment.

  2. To have the proper privileges to review or join a LAF collaboration, use either SAMOOHA_APP_ROLE, or use a role granted the MANAGE LISTING AUTO FULFILLMENT account-level privilege.

Collaborators located in a different cloud host region will experience some additional data lag due to replication frequency. The replication frequency isn’t yet configurable in a Collaboration Data Clean Room.

Example: Two-party collaboration¶

The following example demonstrates a two-party collaboration, where one party (named “alice”) is the collaboration creator, a data provider for herself and bob, and an analysis runner. The other party (named “bob”) is a data provider for himself and alice, and an analysis runner.

The example demonstrates the following actions:

  • Creating a collaboration.

  • Registering templates and data offerings.

  • Adding a template and data offering at collaboration creation time.

  • Joining a collaboration.

  • Adding templates and resources to an existing collaboration.

  • Running an analysis.

To run this example, you must have two separate accounts with Snowflake Data Clean Rooms installed.

You can either download the files and upload them to your Snowflake account, or copy and paste the example code into worksheets in two separate accounts by using Snowsight.

Download the source SQL files, and then upload them into two separate accounts that have Snowflake Data Clean Rooms installed: