Snowflake Data Clean Rooms: Running an Analysis as a Provider¶
This topic provides an example of a provider creating and sharing a clean room, then running an analysis in the clean room after the consumer links their data. The process of a provider executing an analysis in the clean room is commonly referred to as “provider-run analysis”.
Important
If a consumer allows a provider to run an analysis on a template, the consumer, not the provider, is charged for the credits consumed by the provider’s analysis. After the consumer has allowed the provider to run analyses, the consumer must uninstall the clean room to stop incurring costs.
If a consumer wants to obtain an estimate of the number of credits consumed by the provider within a specific time period, they can
execute the following query, where -5
returns an estimate of the previous 5 days of compute consumption by the provider:
SELECT * FROM table(samooha_by_snowflake_local_db_dev.public.udtf(-5));
The flow of this example switches between the actions taken by the provider and the actions taken by the consumer. This flows consists of the following actions:
Provider:
a. Creating a clean room for an overlay study.
b. Linking datasets to the clean room.
c. Adding policies governing which columns can be joined on and used in the analysis.
d. Enabling a predefined overlap analysis template.
e. Sharing the clean room with the consumer.
f. Configuring the clean room to allow themselves to run an analysis.
g. Configuring the clean room to prevent the consumer from running analyses. In this example, the consumer acts as a data provider only.
h. Running an overlap study with the consumer who has installed the clean room.
Consumer:
a. Installing the clean room shared by the provider.
b. Linking datasets to the clean room.
c. Setting security policies on datasets.
d. Approving the provider’s ability to run analyses in the clean room.
Prerequisites¶
You need two separate Snowflake accounts to complete this example. Use the first account to execute the provider’s commands, then switch to the second account to execute the consumer’s commands.
Provider¶
Use a Snowflake worksheet in the provider account to execute the commands in this section.
Set up the environment¶
Before using the Developer APIs, you need to assume the SAMOOHA_APP_ROLE role and specify the warehouse used to execute the APIs. If you don’t have the SAMOOHA_APP_ROLE role, contact your account administrator.
Execute the following commands to set up your environment:
USE ROLE samooha_app_role;
USE WAREHOUSE app_wh;
Create the clean room¶
Create a name for the clean room. Enter a new clean room name to avoid colliding with existing clean room names. Note that clean room names can only be alphanumeric. Clean room names cannot contain special characters other than spaces and underscores.
First, execute the following command to set a clean room name for the example clean room:
SET cleanroom_name = 'Provider Run Analysis Overlap';
If a clean room with the specified name already exists, this process fails.
This procedure may take a little longer to run, typically about half a minute.
The second argument to provider.cleanroom_init is the distribution of the clean room. This can either be INTERNAL or EXTERNAL. For testing purposes, if you are sharing the clean room to an account in the same organization, you can use INTERNAL to bypass the automated security scan that must take place before an application package is released to collaborators. However, if you are sharing this clean room to an account in a different organization, you must use an EXTERNAL clean room distribution.
CALL samooha_by_snowflake_local_db.provider.cleanroom_init($cleanroom_name, 'INTERNAL');
In order to view the status of the security scan, execute the following command:
CALL samooha_by_snowflake_local_db.provider.view_cleanroom_scan_status($cleanroom_name);
Once you have created your clean room, you must set its release directive before it can be shared with any collaborator. However, if your distribution was set to EXTERNAL, you must first wait for the security scan to complete before setting the release directive. You can continue running the remainder of the steps and return here before the provider.create_cleanroom_listing step while the scan runs.
In order to set the release directive, execute the following command:
CALL samooha_by_snowflake_local_db.provider.set_default_release_directive($cleanroom_name, 'V1_0', '0');
Link the dataset¶
You can link tables into the clean room by specifying fully qualified table names (<Database>.<Schema>.<Table>) as an array. For example, to link a table into both clean rooms, execute the following command:
CALL samooha_by_snowflake_local_db.provider.link_datasets($cleanroom_name, ['SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS']);
Note
If this step doesn’t work even though your table exists, the database that contains the table might not be registered. To register the database, execute the following commands as a user with the ACCOUNTADMIN role.
USE ROLE accountadmin;
CALL samooha_by_snowflake_local_db.provider.register_db('<DATABASE_NAME>');
USE ROLE samooha_app_role;
You can verify that the datasets were linked to the clean room by executing the following command:
CALL samooha_by_snowflake_local_db.provider.view_provider_datasets($cleanroom_name);
Set the join policy for the dataset¶
To determine which columns to use as the join policy, you can look at your dataset to determine the PII columns. For example, to see the top 10 rows of a table, execute the following query:
SELECT * FROM SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS LIMIT 10;
Next, specify which columns the consumer is allowed to join on when running templates within the clean room. This procedure should be called on identity columns like email. If you set the join policy a second time, the previously set join policy is completely replaced by the new one.
CALL samooha_by_snowflake_local_db.provider.set_join_policy($cleanroom_name, ['SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS:HEM']);
If you want to view the join policy that has been added to the clean room, execute the following command:
CALL samooha_by_snowflake_local_db.provider.view_join_policy($cleanroom_name);
Add analysis template to the clean room¶
Templates determine the type of analyses that can be executed in a clean room. In this example, you are using a predefined template that lets collaborators run an analysis on the overlap between the provider’s datasets and the consumer’s datasets.
Note that this template natively implements the additional security guarantees provided by differential privacy.
CALL samooha_by_snowflake_local_db.provider.add_templates($cleanroom_name, ['prod_overlap_analysis']);
Set the column policy on each table¶
Before setting a column policy on a table, execute a query to display the columns of the table. For example, to view the top 10 rows, execute the following command:
SELECT * FROM SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS LIMIT 10;
For every table and template combination, set the columns that the analyst can use in an analysis, for example, the columns they can group on or aggregate. This gives flexibility so that the same table can allow different column selections depending on the underlying template. Wait until after you have added a template to set the column policy on a table.
Note that the column policy is replace only, so if the function is called again, then the previously set column policy is completely replaced by the new one.
Column policies should not be used on identity columns like email, HEM, or RampID because you don’t want the analyst to be able to group by these columns. In the production environment, the Snowflake infers PII columns and blocks this operation, but this inference is not available in a sandbox environment. It should only be used on columns that you want the analyst to be able to aggregate and group by, for example, Status, Age Band, Region Code, or Days Active.
To set the column policy for a template/table combination, execute the following command:
CALL samooha_by_snowflake_local_db.provider.set_column_policy($cleanroom_name, [
'prod_overlap_analysis:SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS:STATUS',
'prod_overlap_analysis:SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS:AGE_BAND',
'prod_overlap_analysis:SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS:DAYS_ACTIVE',
'prod_overlap_analysis:SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS:REGION_CODE']);
If you want to view the column policy that has been added to the clean room, execute the following command:
call samooha_by_snowflake_local_db.provider.view_column_policy($cleanroom_name);
Configuring who can run analyses¶
Note
You must use the APIs in this section after you share the clean room with a consumer, but before the consumer installs it. If you change the configuration of the clean room after a consumer has installed it, the consumer must reinstall the clean room.
For this example, you are going to configure the clean room so the provider can run analyses, but the consumer cannot. This means the consumer will only be able to add datasets and set security policies.
To configure the clean room so the provider can run an analysis, execute the following command:
CALL samooha_by_snowflake_local_db.provider.enable_provider_run_analysis($cleanroom_name, ['<CONSUMER_ACCOUNT_LOCATOR>']);
Note
Though not used for this example, the provider can reverse the configuration and prevent themselves from running analyses by executing the provider.disable_provider_run_analysis
command.
By default, a consumer can run analyses in a clean room. To configure the clean room to prevent the consumer from running analyses, execute the following command:
CAll samooha_by_snowflake_local_db.provider.disable_consumer_run_analysis($cleanroom_name, ['<CONSUMER_ACCOUNT_LOCATOR>']);
Note
Though not used for this example, the provider can reverse the configuration and allow the consumer to run analyses by executing the provider.enable_consumer_run_analysis
command.
Consumer¶
You are now switching to act as the consumer of the clean room. Use a Snowflake worksheet in the consumer account to execute the commands in this section.
Set up the environment¶
Before using the Developer APIs, you need to assume the SAMOOHA_APP_ROLE role and specify the warehouse used to execute the APIs. If you don’t have the SAMOOHA_APP_ROLE role, contact your account administrator.
Execute the following commands to set up your environment:
USE ROLE samooha_app_role;
USE WAREHOUSE app_wh;
Install the clean room¶
Before installing the clean room that the provider has shared with you, assign a name for the clean room.
SET cleanroom_name = 'Provider Run Analysis Overlap';
The following command installs the clean room in the consumer account:
CALL samooha_by_snowflake_local_db.consumer.install_cleanroom($cleanroom_name, '<PROVIDER_ACCOUNT_LOCATOR>');
Link the dataset¶
Now you can link some of your datasets into the clean room. The provider will use this data when they run an analysis in the clean room.
CALL samooha_by_snowflake_local_db.consumer.link_datasets($cleanroom_name, ['SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS']);
Note
If this step doesn’t work even though your table exists, the database that contains the table might not be registered. To register the database, execute the following commands as a user with the ACCOUNTADMIN role.
USE ROLE accountadmin;
CALL samooha_by_snowflake_local_db.provider.register_db('<DATABASE_NAME>');
USE ROLE samooha_app_role;
If you want to view the datasets that you have added to the clean room, call the following procedure.
CALL samooha_by_snowflake_local_db.consumer.view_consumer_datasets($cleanroom_name);
Set the secure join policies for your dataset¶
In this section, you’ll specify which columns the provider is allowed to join on when executing analyses in the clean room. You should execute the following command on identity columns like email. The join policy is “replace only”, so if the function is called again, then the previously set join policy is completely replaced by the new one.
CALL samooha_by_snowflake_local_db.consumer.set_join_policy($cleanroom_name, ['SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS:HEM']);
Set column policies for your dataset¶
For every table and template combination, set the columns that the provider can use in an analysis, for example, the columns they can group on or aggregate. This gives flexibility so that the same table can allow different column selections depending on the underlying template. Wait until after you have added a template to set the column policy on a table.
Note that the column policy is replace only, so if the function is called again, then the previously set column policy is completely replaced by the new one.
Column policies should not be used on identity columns like email, HEM, or RampID because you don’t want the provider to be able to group by these columns. In the production environment, the Snowflake infers PII columns and blocks this operation, but this inference is not available in a sandbox environment. It should only be used on columns that you want the provider to be able to aggregate and group by, for example, Status, Age Band, Region Code, or Days Active.
To set the column policy for a template/table combination, execute the following command:
CALL samooha_by_snowflake_local_db.consumer.set_column_policy($cleanroom_name, [
'prod_overlap_analysis:SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS:STATUS',
'prod_overlap_analysis:SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS:AGE_BAND',
'prod_overlap_analysis:SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS:DAYS_ACTIVE',
'prod_overlap_analysis:SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS:REGION_CODE'
]);
Allow provider to run analyses¶
A consumer needs to explicitly approve a provider’s ability to run analyses on a template-by-template basis. Without this approval, the provider cannot run an analysis, even if they configured the clean room so they can.
First, you can check if the provider configured the clean room so they can execute analyses.
CALL samooha_by_snowflake_local_db.library.is_provider_run_enabled($cleanroom_name)
Next, you can execute the following command to allow the provider to run analyses. You must execute the command for every template you want the provider to be able to use. You should give your approval after setting the join and column policies to prevent the provider from running an analysis on unprotected data.
CALL samooha_by_snowflake_local_db.consumer.enable_templates_for_provider_run($cleanroom_name, ['prod_overlap_analysis']);
Provider¶
You are now switching back to the provider account to run the analysis in the clean room. Use a Snowflake worksheet in the provider account to execute the commands in this section.
Get access to information from consumer¶
Running an analysis as a provider processes data in the consumer’s account. To retrieve the results of the analysis, the provider needs access to information coming back from the consumer to the provider. Execute the following command to gain access to information originating from the consumer:
CALL samooha_by_snowflake_local_db.provider.mount_request_logs_for_all_consumers($cleanroom_name);
Run the analysis¶
The provider executes the provider.submit_analysis_request
command when they want to run an analysis.
CALL samooha_by_snowflake_local_db.provider.submit_analysis_request(
$cleanroom_name,
'<CONSUMER_ACCOUNT>',
'prod_overlap_analysis',
['SAMOOHA_SAMPLE_DATABASE.demo.customers'],
['SAMOOHA_SAMPLE_DATABASE.demo.customers'],
object_construct(
'dimensions', ['c.REGION_CODE'],
'measure_type', ['AVG'],
'measure_column', ['c.DAYS_ACTIVE'],
'where_clause', 'p.hem=c.hem'
));
The provider.submit_analysis_request
command returns a request identifier. You must save this identifier so you can check the status of the analysis request
and retrieve the analysis results. For example, you could execute the following to save the identifier into a local variable:
SET request_id = '<REQUEST_ID>';
After you submit the request to perform an analysis, you can look at the status of the request by executing the following command. Note that there might be a delay when you execute the command for the first time.
CALL samooha_by_snowflake_local_db.provider.check_analysis_status(
$cleanroom_name,
$request_id,
'<CONSUMER_ACCOUNT_LOCATOR>'
);
Note
If this API fails, make sure you configured the clean room to allow provider run analyses before you installed their clean room as the consumer. If the clean room was configured after the consumer installed a clean room, the consumer must re-install the clean room.
When the status returned by the provider.check_analysis_status
command is COMPLETED, you can retrieve the results of the analysis by executing the following command:
CALL samooha_by_snowflake_local_db.provider.get_analysis_result(
$cleanroom_name,
$request_id,
'<CONSUMER_ACCOUNT_LOCATOR>'
);