Snowflake Data Clean Rooms: Multi-Provider Clean Rooms¶
This topic provides an example of running an analysis across a cluster of clean rooms from multiple providers. It demonstrates how a query can access data across clean rooms while maintaining the security guarantees of each clean room. It also demonstrates how the consumer can define the template used to run the analysis, which is a common use case for multi-provider clean rooms.
In a multi-provider analysis, data is used from each clean room in a completely secure manner. Snowflake Data Clean Rooms ensures compliance with every individual clean room security policy, while also attesting that all SQL run against the data is exactly what each clean room participant expects.
In many cases, the consumer executing a multi-provider analysis wants to be able to define the template for the analysis rather than use a template defined by the providers. This allows consumers to control how they gain insights when analyzing data from multiple parties. For more information about consumer-defined templates, see Using the developer API to add consumer-defined templates.
Note
Multi-provider clean room analyses are not allowed in clean rooms which have been shared across regions.
The multi-provider analysis example includes the following steps:
Provider:
a. Create two clean rooms, which simulates two different providers.
b. Share the clean rooms with the same consumer.
Consumer:
a. Install both provider clean rooms.
b. Send requests to add a consumer-defined template to the clean rooms.
Provider:
a. Approve the consumer’s requests to add a consumer-defined template.
b. Set column policies for the consumer-defined template.
Consumer:
a. Raise multi-provider analysis request.
Provider:
a. Enable the clean rooms for multi-provider analysis.
b. Approve multi-provider analysis requests from the consumer.
Consumer
a. Execute the analysis across the clean room cluster.
Prerequisites¶
You need two separate Snowflake accounts to complete this example. Use the first account to execute the provider’s commands, then switch to the second account to execute the consumer’s commands.
Consumer: Install clean rooms¶
Use a Snowflake worksheet in the consumer account to execute the commands in this section.
As a consumer, you can have multiple clean rooms shared with you. Before running a multi-provider analysis across them all, you need to install each one into your account and link the datasets you intend to use for analyses.
Set up the environment¶
Before using the Developer APIs, you need to assume the SAMOOHA_APP_ROLE role and specify the warehouse used to execute the APIs. If you don’t have the SAMOOHA_APP_ROLE role, contact your account administrator.
Execute the following commands to set up your environment:
USE ROLE samooha_app_role;
USE WAREHOUSE app_wh;
Install the clean rooms¶
Before installing the clean rooms, assign a name for each clean room that the provider has shared with you.
set cleanroom_name_1 = 'Samooha Cleanroom Multiprovider Clean Room 1';
set cleanroom_name_2 = 'Samooha Cleanroom Multiprovider Clean Room 2';
The following commands install the clean rooms in the consumer account:
CALL samooha_by_snowflake_local_db.consumer.install_cleanroom($cleanroom_name_1, '<PROVIDER_ACCOUNT_LOCATOR>');
CALL samooha_by_snowflake_local_db.consumer.install_cleanroom($cleanroom_name_2, '<PROVIDER_ACCOUNT_LOCATOR>');
After the clean room has been installed, the provider has to finish setting up the clean room on their side before it is enabled for use. The below function allows you to check the status of the clean room. Once it has been enabled, you should be able to run the Run Analysis command below. It typically takes about 1 minute for the clean room to be enabled.
CALL samooha_by_snowflake_local_db.consumer.is_enabled($cleanroom_name_1);
CALL samooha_by_snowflake_local_db.consumer.is_enabled($cleanroom_name_2);
After a clean room share has been installed, the list of clean rooms available can be viewed using the below command.
CALL samooha_by_snowflake_local_db.consumer.view_cleanrooms();
Link the dataset¶
Link datasets into the clean room to carry out secure computation with the provider’s data.
CALL samooha_by_snowflake_local_db.consumer.link_datasets($cleanroom_name_1, ['SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS']);
CALL samooha_by_snowflake_local_db.consumer.link_datasets($cleanroom_name_2, ['SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS']);
Note
If this step doesn’t work even though your table exists, the database that contains the table might not be registered. To register the database, execute the following commands as a user with the ACCOUNTADMIN role.
USE ROLE accountadmin;
CALL samooha_by_snowflake_local_db.provider.register_db('<DATABASE_NAME>');
USE ROLE samooha_app_role;
If you want to view the datasets that you have added to the clean room, call the following procedure.
CALL samooha_by_snowflake_local_db.consumer.view_consumer_datasets($cleanroom_name_1);
CALL samooha_by_snowflake_local_db.consumer.view_consumer_datasets($cleanroom_name_2);
Consumer: Send request to add consumer-defined template to clean rooms¶
The consumer running a multi-provider analysis is often in the best position to know how to extract value from the data from multiple parties. A consumer can send a request to clean room providers to include a consumer-defined template in the clean room so the consumer can use it to run analyses. For more information about adding consumer-defined templates to a clean room, see Adding Consumer-Defined Templates.
Note
Analyses in multi-provider clean rooms work by rendering all tables from other clean rooms, as well from the current clean room, into the source_tables
argument in the template. However, during security validations and request approval, each clean room also receives just the tables relevant to them. Therefore, the template should be written generically without assuming a specific number of source_table
arguments, which allows it to scale across one or more source_table
inputs. You can use a Jinja for-loop to implement this.
For this example, the consumer send the following requests. Each request includes the Jinja code that defines the template.
CALL samooha_by_snowflake_local_db.consumer.create_template_request($cleanroom_name_1, 'multiprovider_overlap_analysis', $$
WITH union_data AS (
select identifier({{ hem_col[0] | join_policy }}) as join_col, identifier({{ dimensions[0] | column_policy }}) as group_col
{% for dim in dimensions[1:] %}
, identifier({{ dim }}) as group_col_{{ loop.index }}
{% endfor %}
from identifier({{ source_table[0] }}) p1
{% for src in source_table[1:] %}
inner join (
select identifier({{ hem_col[loop.index] | join_policy }}), identifier({{ dimensions[loop.index] | column_policy }}) from identifier({{ src }}) p{{ loop.index + 1}} ) p{{ loop.index + 1}}
on identifier({{ hem_col[0] }}) = identifier({{ hem_col[loop.index] }})
{% endfor %}
)
select count(distinct join_col), group_col
{% for dim in dimensions[1:] %}
, group_col_{{ loop.index }}
{% endfor %}
from union_data u
inner join identifier({{ my_table[0] }}) c
on u.join_col = c.{{ consumer_join_col | sqlsafe }}
group by group_col
{% for dim in dimensions[1:] %}
, group_col_{{ loop.index }}
{% endfor %}
$$);
CALL samooha_by_snowflake_local_db.consumer.create_template_request($cleanroom_name_2, 'multiprovider_overlap_analysis', $$
WITH union_data AS (
select identifier({{ hem_col[0] | join_policy }}) as join_col, identifier({{ dimensions[0] | column_policy }}) as group_col
{% for dim in dimensions[1:] %}
, identifier({{ dim }}) as group_col_{{ loop.index }}
{% endfor %}
from identifier({{ source_table[0] }}) p1
{% for src in source_table[1:] %}
inner join (
select identifier({{ hem_col[loop.index] | join_policy }}), identifier({{ dimensions[loop.index] | column_policy }}) from identifier({{ src }}) p{{ loop.index + 1}} ) p{{ loop.index + 1}}
on identifier({{ hem_col[0] }}) = identifier({{ hem_col[loop.index] }})
{% endfor %}
)
select count(distinct join_col), group_col
{% for dim in dimensions[1:] %}
, group_col_{{ loop.index }}
{% endfor %}
from union_data u
inner join identifier({{ my_table[0] }}) c
on u.join_col = c.{{ consumer_join_col | sqlsafe }}
group by group_col
{% for dim in dimensions[1:] %}
, group_col_{{ loop.index }}
{% endfor %}
$$);
Provider: Approve consumer request to add template¶
The provider must approve the consumer’s requests to include a consumer-defined template in the clean rooms.
The provider uses the provider.list_template_requests
command to list the requests, including obtaining the UUID of the new request. The provider then approves the requests by executing the provider.approve_template_request
command. For more information about this process, see Adding Consumer-Defined Templates.
CALL samooha_snowflake_local_db.provider.list_template_requests($cleanroom_name_1);
CALL samooha_by_snowflake_local_db.provider.approve_template_request($cleanroom_name_1, <REQUEST_UUID>);
CALL samooha_snowflake_local_db.provider.list_template_requests($cleanroom_name_2);
CALL samooha_by_snowflake_local_db.provider.approve_template_request($cleanroom_name_2, <REQUEST_UUID>);
Set the column policy on each table¶
For every table and template combination, set the columns that the consumer can use in an analysis, for example, the columns they can group on or aggregate. This gives flexibility so that the same table can allow different column selections depending on the underlying template. Wait until after you have added a template to set the column policy on a table.
Note that the column policy is replace only, so if the function is called again, then the previously set column policy is completely replaced by the new one.
Column policies should not be used on identity columns like email, HEM, or RampID because you don’t want the consumer to be able to group by these columns. In the production environment, the Snowflake infers PII columns and blocks this operation, but this inference is not available in a sandbox environment. It should only be used on columns that you want the consumer to be able to aggregate and group by, for example, Status, Age Band, Region Code, or Days Active.
For the “column_policy” and “join_policy” to carry out checks on the consumer analysis requests, all column names MUST be referred to as dimensions or measure_columns in the SQL Jinja template. Make sure you use these tags to refer to columns you want to be checked in custom SQL Jinja templates.
To set the column policy for a template/table combination, execute the following commands:
CALL samooha_by_snowflake_local_db.provider.set_column_policy($cleanroom_name_1, [ 'multiprovider_overlap_analysis:SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS:STATUS']);
CALL samooha_by_snowflake_local_db.provider.set_column_policy($cleanroom_name_2, [
'multiprovider_overlap_analysis:SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS:REGION_CODE']);
Consumer: Raise multi-provider clean room analysis request¶
The next step is to raise requests for a multi-provider analysis across the set of clean rooms you have installed. This set of clean rooms is known as a clean room cluster. This process writes requests to each provider asking whether this multi-provider analysis can be approved and executed. These requests are asynchronously processed by the provider using either an automated or a manual flow. If an automated approval flow is used, request processing is done in about 2 minutes, and if approval is given by each clean room (after checking the request for compliance with the clean room’s security policies), the flow can be executed.
The multi-provider clean room request flow requires that the provider tables be specified under the source_table array argument. The tables need to start with the clean room name in which they exist. The overall form is <CLEANROOM_NAME>.<DB>.<SCHEMA>.<TABLE> . Consumer tables can be provided under the my_table array argument.
CALL samooha_by_snowflake_local_db.consumer.prepare_multiprovider_flow(
[$cleanroom_name_1, $cleanroom_name_2],
'prod_aggregate_data',
object_construct(
'source_table', [
concat($cleanroom_name_1, '.SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS'),
concat($cleanroom_name_2, '.SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS')
],
'my_table', ['SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS']),
'hem_col', ['p1.HASHED_EMAIL', 'p2.HASHED_EMAIL'],
'dimensions', ['p1.STATUS', 'p2.STATUS'],
'consumer_join_col', 'HASHED_EMAIL'
)
);
This API first verifies the request with each clean room’s security policies before raising the multi-provider analysis request to each clean room’s provider. If these checks fail, this API will return the error message it encountered.
How to determine the inputs to prepare_multiprovider_flow¶
To run the analysis, you need to pass in some parameters to the run_analysis function. This section shows you how to determine what parameters to pass in.
Template names
First, you can see the supported analysis templates by executing the following command:
CALL samooha_by_snowflake_local_db.consumer.view_added_templates($cleanroom_name);
Before running an analysis with a template, you need to know what arguments to specify and what types are expected. For custom templates, you can execute the following command:
CALL samooha_by_snowflake_local_db.consumer.view_template_definition($cleanroom_name, 'prod_custom_template');
This returns a large number of different SQL Jinja parameters. To parse the SQL Jinja template and extract the arguments that need to be specified in run_analysis into a list, execute the following command.
CALL samooha_by_snowflake_local_db.consumer.get_arguments_from_template($cleanroom_name, 'prod_custom_template');
Dataset names
If you want to view the dataset names that have been added to the clean room by the provider, execute the following command. Note that you cannot view the data present in the datasets that have been added to the clean room by the provider.
CALL samooha_by_snowflake_local_db.consumer.view_provider_datasets($cleanroom_name);
You can also see the tables you’ve linked to the clean room by executing the following command:
CALL samooha_by_snowflake_local_db.consumer.view_consumer_datasets($cleanroom_name);
Dimension and measure columns
While running the analysis, you might want to filter, group by and aggregate on certain columns. If you want to view the column policy that has been added to the clean room by the provider, execute the following command:
CALL samooha_by_snowflake_local_db.consumer.view_provider_column_policy($cleanroom_name);
Common errors
If you are getting Not approved: unauthorized columns used error as a result of run analysis, try viewing the join policy and column policy set by the provider.
CALL samooha_by_snowflake_local_db.consumer.view_provider_join_policy($cleanroom_name);
CALL samooha_by_snowflake_local_db.consumer.view_provider_column_policy($cleanroom_name);
It is also possible that you have exhausted your privacy budget, which prevents you from executing more queries. Your remaining privacy budget can be viewed using the following command. The budget resets daily, or the clean room provider can reset it manually.
CALL samooha_by_snowflake_local_db.consumer.view_remaining_privacy_budget($cleanroom_name);
You can check if Differential Privacy has been enabled for your clean room using the following API:
CALL samooha_by_snowflake_local_db.consumer.is_dp_enabled($cleanroom_name);
Provider: Enable multi-provider analysis and approve requests¶
You are now switching back to the provider account to enable multi-provider analysis for the consumer and to process the requests raised.
Enable the clean rooms for multi-provider analysis¶
A consumer cannot successfully execute a multi-provider analysis until you enable the clean room for this purpose. When enabling the clean room for the consumer, you also specify which clean rooms can be included in the multi-provider analysis. The consumer cannot execute a multi-provider analysis that involves your clean room and the clean room of another provider unless you expressly approve the other provider’s clean room.
To allow a consumer to run a multi-provider analysis using a clean room from any other provider, specify an empty list when executing the following commands.
Note
This can only be called on consumers who have already installed the clean room.
CALL samooha_by_snowflake_local_db.provider.enable_multiprovider_computation($cleanroom_name_1, '<CONSUMER_ACCOUNT_LOCATOR>', [concat('<PROVIDER_ORG_NAME>.<PROVIDER_ACCOUNT_NAME>.', $cleanroom_name_2)]);
CALL samooha_by_snowflake_local_db.provider.enable_multiprovider_computation($cleanroom_name_2, '<CONSUMER_ACCOUNT_LOCATOR>', [concat('<PROVIDER_ORG_NAME>.<PROVIDER_ACCOUNT_NAME>.', $cleanroom_name_1)]);
Approving multi-provider analysis requests¶
After multi-provider analysis has been enabled in each clean room, all multi-provider analysis requests raised against the clean rooms can be viewed using the following API:
CALL samooha_by_snowflake_local_db.provider.view_multiprovider_requests($cleanroom_name_1, '<CONSUMER_ACCOUNT_LOCATOR>');
CALL samooha_by_snowflake_local_db.provider.view_multiprovider_requests($cleanroom_name_2, '<CONSUMER_ACCOUNT_LOCATOR>');
Request approval can happen in one of two ways:
Automatically using a task that listens to incoming requests and triggers when needed.
Manually by a user explicitly processing incoming multi-provider analysis requests.
By default, approvals are manual and the task is created in a suspended state by the enable_multiprovider_computation
API. This requires a user to manually process incoming requests. This can be done by executing the following commands:
CALL samooha_by_snowflake_local_db.provider.process_multiprovider_request($cleanroom_name_1, '<CONSUMER_ACCOUNT_LOCATOR>', '<REQUEST_ID>');
CALL samooha_by_snowflake_local_db.provider.process_multiprovider_request($cleanroom_name_2, '<CONSUMER_ACCOUNT_LOCATOR>', '<REQUEST_ID>');
Note
This API can either be made to operate on a request ID level by specifying a particular request ID from the output of view_multiprovider_requests
API, or REQUEST_ID can be passed in as ‘-1’ to process all incoming requests.
This process does not approve incoming requests, but rather passes them through a processing logic to verify if it can be approved based on factors such as the age of the request and whether the requested collaborators are in your list of approved collaborators you set during enable_multiprovider_computation
.
Once the request is processed, it is written to the samooha_cleanroom_${UUID}.admin.request_log_multiprovider
table. You can see the list of requests and whether they have been approved or not through queries like the following:
SELECT * FROM samooha_cleanroom_Samooha_Cleanroom_Multiprovider_Clean_Room_1.admin.request_log_multiprovider;
SELECT * FROM samooha_cleanroom_Samooha_Cleanroom_Multiprovider_Clean_Room_2.admin.request_log_multiprovider;
Previous requests that haven’t been approved can also be overridden to APPROVE=True
, or conversely access can be removed by setting APPROVE=False
using the following queries. See the Security Considerations section below for more details on the <CONDITIONS>.
-- Override and approve a request that had previously been rejected
UPDATE samooha_cleanroom_Samooha_Cleanroom_Multiprovider_Clean_Room_1.admin.request_log_multiprovider set APPROVED=True where <CONDITIONS>;
UPDATE samooha_cleanroom_Samooha_Cleanroom_Multiprovider_Clean_Room_2.admin.request_log_multiprovider set APPROVED=True where <CONDITIONS>;
-- Revoke access to a query you had previously approved
UPDATE samooha_cleanroom_Samooha_Cleanroom_Multiprovider_Clean_Room_1.admin.request_log_multiprovider set APPROVED=False where <CONDITIONS>;
UPDATE samooha_cleanroom_Samooha_Cleanroom_Multiprovider_Clean_Room_2.admin.request_log_multiprovider set APPROVED=False where <CONDITIONS>;
If the automated approval flow is preferred over manual approvals, then the multi-provider analysis tasks need to be resumed by executing the following commands:
CALL samooha_by_snowflake_local_db.provider.resume_multiprovider_tasks($cleanroom_name_1, '<CONSUMER_ACCOUNT_LOCATOR>');
CALL samooha_by_snowflake_local_db.provider.resume_multiprovider_tasks($cleanroom_name_2, '<CONSUMER_ACCOUNT_LOCATOR>');
Conversely, if the automated approval flow is currently engaged, it can be switched off by executing the following commands:
CALL samooha_by_snowflake_local_db.provider.suspend_multiprovider_tasks($cleanroom_name_1, '<CONSUMER_ACCOUNT_LOCATOR>');
CALL samooha_by_snowflake_local_db.provider.suspend_multiprovider_tasks($cleanroom_name_2, '<CONSUMER_ACCOUNT_LOCATOR>');
Security considerations: Managing multi-provider analysis requests¶
Multi-provider analyses work through adding the hash of an approved query to a row access policy. The row access policy is usually created under the samooha_cleanroom_${UUID}.shared_schema
schema, and the definition of the row access policy is:
CREATE OR REPLACE ROW ACCESS POLICY samooha_cleanroom_${UUID}.shared_schema.${firewall_name} AS (foo varchar) RETURNS BOOLEAN ->
EXISTS (SELECT request_id FROM samooha_cleanroom_${UUID}.admin.REQUEST_LOG_MULTIPROVIDER w
WHERE party_account=current_account()
AND approved=true
AND sha2(current_statement()) = query_hash
);
The row access policy works by finding approved requests in the samooha_cleanroom_${UUID}.admin.request_log_multiprovider
table on your account for the specific consumer of your clean room, and checking that the hash of the current query running in their account matches the query that was approved.
All security access to your data for multi-provider flows is gated through the approvals noted in this table (samooha_cleanroom_${UUID}.admin.request_log_multiprovider
). You must manage it proactively to ensure only the queries you wish to have access to your data are allowed to run.
Simple queries can be used to either approve or un-approve (that is, remove the approvals of) previously processed requests. For example, you can execute the following queries:
-- Override and approve a request that had previously been rejected
UPDATE samooha_cleanroom_Samooha_Cleanroom_Multiprovider_Clean_Room_1.admin.request_log_multiprovider SET APPROVED=True WHERE <CONDITIONS>;
UPDATE samooha_cleanroom_Samooha_Cleanroom_Multiprovider_Clean_Room_2.admin.request_log_multiprovider SET APPROVED=True WHERE <CONDITIONS>;
-- Revoke access to a query you had previously approved
UPDATE samooha_cleanroom_Samooha_Cleanroom_Multiprovider_Clean_Room_1.admin.request_log_multiprovider SET APPROVED=False WHERE <CONDITIONS>;
UPDATE samooha_cleanroom_Samooha_Cleanroom_Multiprovider_Clean_Room_2.admin.request_log_multiprovider SET APPROVED=False WHERE <CONDITIONS>;
As seen in the row access policy, access is given to the data depending on the query_hash
. However, the query_hash
can also depend on which clean room is randomly selected to execute the query, so the <CONDITIONS> passed into the query above to determine which requests to revoke access to/approve need to follow these best practices:
If manually approving a request, try to be as specific as possible about the request by either filtering on
request_id
and/orcleanroom_names
,requester_account
andtemplate_name
.If revoking a previous approval for a query, revoke all requests where the
query_hash
matches, and also revoke all requests for that givenrequester_account
andtemplate_name
andcleanroom_names
(see example below).If a new request is not approved, it does not change the approval of previous requests. If you wish to revoke previous query access, you need to mark the corresponding requests with APPROVED=False in the
samooha_cleanroom_${UUID}.admin.request_log_multiprovider
table.If you change the set of allowed collaborators by running
enable_multiprovider_computation
again, previous requests are not revoked by default. You need to manage approvals by revoking access to previous collaborations by setting APPROVED=False in thesamooha_cleanroom_${UUID}.admin.request_log_multiprovider
table.
An example of how to thoroughly revoke a particular collaboration’s ability to make requests. Suppose there was a collaboration running on requester_account=ABC123
that you wanted to revoke. You can execute the following queries:
UPDATE samooha_cleanroom_{UUID}.admin.request_log_multiprovider SET APPROVED=False WHERE requester_account = 'ABC123' AND query_hash = '<HASH>';
UPDATE samooha_cleanroom_{UUID}.admin.request_log_multiprovider SET APPROVED=False WHERE requester_account = 'ABC123' AND template_name = '<TEMPLATE_NAME>' AND request:CLEANROOM_NAMES = [$cleanroom_name1, $cleanroom_name2];
The access for both the query hash and for that template in that clean room collaboration are revoked.
Here are some sample queries that allow you to see how many queries have been raised by each requester account
:
SELECT requester_account, count(*) FROM samooha_cleanroom_{UUID}.admin.request_log_multiprovider;
-- If there is a requester_account raising too many queries they can be disallowed in bulk
UPDATE samooha_cleanroom_{UUID}.admin.request_log_multiprovider SET APPROVED=False WHERE requester_account = '<ACCOUNT>';
Consumer - Execute request¶
After the request raised by the prepare_multiprovider_flow API has been approved, it can be executed using the following API across the clean room cluster. This execution happens by one clean room being randomly selected to execute the flow, and as long as both clean rooms have approved the request, the multi-provider analysis is allowed to proceed.
CALL samooha_by_snowflake_local_db.consumer.execute_multiprovider_flow([$cleanroom_name_1, $cleanroom_name_2]);
Note also that once a request has been approved after the prepare_multiprovider_flow procedure, the execute_multiprovider_flow can be called as many times as needed without needing to call prepare_multiprovider_flow again. However, in order to run a different analysis or one across a different clean room cluster, the prepare_multiprovider_flow needs to be called again.
Useful Functions¶
Once the interoperability requests are written to each provider, the approval takes the form of request-specific customer templates that get added to the clean room and executed during the analysis. These set up the infrastructure required. You can view these templates being added in real-time by the request’s approval process by executing the following commands.
CALL samooha_by_snowflake_local_db.consumer.view_added_templates($cleanroom_name_1);
CALL samooha_by_snowflake_local_db.consumer.view_added_templates($cleanroom_name_2);