Snowflake Data Clean Rooms: Overlap Analysis

This topic describes the provider and consumer flows needed to programmatically set up a clean room, share it with a consumer, and run provider-side data analyses in it. A provider-side data analysis is one where the consumer does not have to bring in their datasets but simply wants to get aggregated insights about the provider’s datasets.

It will cover the following:

  1. Provider:

    a. Creating a fresh clean room.

    b. Securely linking datasets to it.

    c. Adding policies governing which columns can be joined on, and used in the analysis.

    d. Enabling a predefined overlap analysis template.

    e. Sharing it with a consumer.

  2. Consumer:

    a. Installing a clean room shared by the provider.

    b. Adding your datasets to the clean room.

    c. Examining the template provided within the clean room.

    d. Running an analysis within the clean room using the template.

Prerequisites

You need two separate Snowflake accounts to complete this flow. Use the first account to execute the provider’s commands, then switch to the second account to execute the consumer’s commands.

Provider

Note

The following commands should be run in a Snowflake worksheet in the provider account.

Set up the environment

Execute the following commands to set up the Snowflake environment before using developer APIs to work with a Snowflake Data Clean Room. If you don’t have the SAMOOHA_APP_ROLE role, contact your account administrator.

use role samooha_app_role;
use warehouse app_wh;
Copy

Create the clean room

Create a name for the clean room. Enter a new clean room name to avoid colliding with existing clean room names. Note that clean room names can only be alphanumeric. Clean room names cannot contain special characters other than spaces and underscores.

set cleanroom_name = 'Overlap Analysis Demo Clean Room';
Copy

You can create a new clean room with the clean room name set above. If the clean room name set above already exists as an existing clean room, this process fails.

This procedure typically takes about half a minute to run.

The second argument to provider.cleanroom_init is the distribution of the clean room. This can either be INTERNAL or EXTERNAL. For testing purposes, if you are sharing the clean room to an account in the same organization, you can use INTERNAL to bypass the automated security scan which must take place before an application package is released to collaborators. However, if you are sharing this clean room to an account in a different organization, you must use an EXTERNAL clean room distribution.

call samooha_by_snowflake_local_db.provider.cleanroom_init($cleanroom_name, 'INTERNAL');
Copy

In order to view the status of the security scan, use:

call samooha_by_snowflake_local_db.provider.view_cleanroom_scan_status($cleanroom_name);
Copy

Once you have created your clean room, you must set its release directive before it can be shared with any collaborator. However, if your distribution was set to EXTERNAL, you must first wait for the security scan to complete before setting the release directive. You can continue running the remainder of the steps while the scan completes and return here before the provider.create_or_update_cleanroom_listing step.

In order to set the release directive, call:

call samooha_by_snowflake_local_db.provider.set_default_release_directive($cleanroom_name, 'V1_0', '0');
Copy

Cross-region sharing

In order to share a clean room with a Snowflake customer whose account is in a different region than your account, you must enable Cross-Cloud Auto-Fulfillment. For information about the additional costs associated with collaborating with consumers in other regions, see Cross-Cloud Auto-Fulfillment costs.

When using developer APIs, enabling cross-region sharing is a two-step process:

  1. A Snowflake administrator with the ACCOUNTADMIN role enables Cross-Cloud Auto-Fulfillment for your Snowflake account. For instructions, see Collaborate with accounts in different regions.

  2. You execute the provider.enable_laf_for_cleanroom command to enable Cross-Cloud Auto-Fulfillment for the clean room. For example:

    call samooha_by_snowflake_local_db.provider.enable_laf_for_cleanroom($cleanroom_name);
    
    Copy

After you have enabled Cross-Cloud Auto-Fulfillment for the clean room, you can add consumers to your listing as usual using the provider.create_or_update_cleanroom_listing command. The listing is automatically replicated to remote clouds and regions as needed.

Add analysis templates to the clean room

Add a list of predefined templates using their name identifiers. In this flow, you are going to add a predefined template that allows a consumer to carry out an analysis on the overlap between their datasets and provider datasets in a secure and provider-approved manner on provider-approved columns.

One crucial detail about this template is that it natively implements the additional security guarantees provided by Differential Privacy. See Differential Privacy to learn more.

call samooha_by_snowflake_local_db.provider.add_templates($cleanroom_name, ['prod_overlap_analysis']);
Copy

If you want to view the templates currently active in the clean room, call the following procedure. You can see the modifications you need to use to enable Differential Privacy guarantees on your analysis. A similar pattern can be incorporated into any custom template you choose to write.

Note

Note that all system-defined preset templates are encrypted and aren’t viewable by default. Any custom templates that you add will be visible, however.

call samooha_by_snowflake_local_db.provider.view_added_templates($cleanroom_name);
Copy

Any template added to the clean room can also be cleared away if needed. See the Provider API Reference Guide for more details.

Set the column policy on each table

Display the data linked to see the columns present inside the table. To view the top 10 rows, call the following procedure.

select * from SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS limit 10;
Copy

Set the columns the consumer can group and aggregate (e.g. SUM/AVG) and generally use in an analysis for every table and template combination. This gives flexibility so the same table can allow different column selections depending on the underlying template. This should only be called after adding the template.

Note that the column policy is replace only, so if the function is called again, then the previously set column policy is completely replaced by the new one.

Column policy should not be used on identity columns like email, HEM, RampID, etc. since you don’t want the consumer to be able to group by these columns. In the production environment, the system will intelligently infer PII columns and block this operation, but this feature is not available in the sandbox environment. It should only be used on columns that you want the consumer to be able to aggregate and group by, like Status, Age Band, Channel, Days Active, etc.

call samooha_by_snowflake_local_db.provider.set_column_policy($cleanroom_name, [
'prod_overlap_analysis:SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS:STATUS', 
'prod_overlap_analysis:SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS:AGE_BAND', 
'prod_overlap_analysis:SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS:DAYS_ACTIVE', 
'prod_overlap_analysis:SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS:REGION_CODE']);
Copy

If you want to view the column policy that has been added to the clean room, call the following procedure.

call samooha_by_snowflake_local_db.provider.view_column_policy($cleanroom_name);
Copy

Share with a consumer

Finally, add a data consumer to the clean room by adding their Snowflake account locator and account names as shown below. The Snowflake account name must be of the form <ORGANIZATION>.<ACCOUNT_NAME>.

Note

In order to call the following procedures, make sure you have first set the release directive using provider.set_default_release_directive. You can see the latest available version and patches using:

show versions in application package samooha_cleanroom_Overlap_Analysis_Demo_clean_room;
Copy
call samooha_by_snowflake_local_db.provider.add_consumers($cleanroom_name, '<CONSUMER_ACCOUNT_LOCATOR>', '<CONSUMER_ACCOUNT_NAME>');
call samooha_By_snowflake_local_db.provider.create_or_update_cleanroom_listing($cleanroom_name);
Copy

Multiple consumer account locators can be passed into the provider.add_consumers function as a comma separated string, or as separate calls to provider.add_consumers.

If you want to view the consumers who have been added to this clean room, call the following procedure.

call samooha_by_snowflake_local_db.provider.view_consumers($cleanroom_name);
Copy

If you want to view the clean rooms that have been created recently, use the following procedure.

call samooha_by_snowflake_local_db.provider.view_cleanrooms();
Copy

If you want to get more insights about the clean room that you have created, use the following procedure.

call samooha_by_snowflake_local_db.provider.describe_cleanroom($cleanroom_name);
Copy

Any clean room created can also be deleted. The following command drops the clean room entirely, so any consumers who previously had access to the clean room will no longer be able to use it. If a clean room with the same name is desired in the future, it must be re-initialized using the above flow.

call samooha_by_snowflake_local_db.provider.drop_cleanroom($cleanroom_name);
Copy

Note

The provider flow is finished at this point. Switch to the consumer account to continue with the consumer flow.

Consumer

Note

The following commands should be run in a Snowflake worksheet in the consumer account

Set up the environment

Execute the following commands to set up the Snowflake environment before using developer APIs to work with a Snowflake Data Clean Room. If you don’t have the SAMOOHA_APP_ROLE role, contact your account administrator.

use role samooha_app_role;
use warehouse app_wh;
Copy

Install the clean room

Once a clean room share has been installed, the list of clean rooms available can be viewed using the below command.

call samooha_by_snowflake_local_db.consumer.view_cleanrooms();
Copy

Assign a name for the clean room that the provider has shared with you.

set cleanroom_name = 'Overlap Analysis Demo Clean room';
Copy

The following command installs the clean room in the consumer account with the associated provider and selected clean room. Enter the provider’s Snowflake account locator (not name).

This procedure may take a little longer to run, typically about half a minute.

call samooha_by_snowflake_local_db.consumer.install_cleanroom($cleanroom_name, '<PROVIDER_ACCOUNT_LOCATOR>');
Copy

Once the clean room has been installed, the provider has to finish setting up the clean room on their side before it is enabled for use. The below function allows you to check the status of the clean room. Once it has been enabled, you should be able to run the Run Analysis command below. It typically takes about 1 minute for the clean room to be enabled.

call samooha_by_snowflake_local_db.consumer.is_enabled($cleanroom_name);
Copy

Run the analysis

Now that the clean room is installed, you can run the analysis template given to the clean room by the provider using a “run_analysis” command. You can see how each field is determined in the sections below.

Note

Before running the analysis, you can alter the warehouse size, or use a new, bigger, warehouse size if your tables are large.

-- Example run analysis procedure with single provider dataset

call samooha_by_snowflake_local_db.consumer.run_analysis(
  $cleanroom_name,                    -- cleanroom
  'prod_overlap_analysis',            -- template name

  ['SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS'], -- your tables
  
  ['SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS'],  -- the provider table we want to carry out analysis on

  object_construct(                        -- The keyword arguments needed for the SQL Jinja template
      'dimensions', ['p.REGION_CODE'],        -- Group by column

      'measure_type', ['AVG'],           -- Aggregate function you want to perform like COUNT, AVG, etc.

      'measure_column', ['p.DAYS_ACTIVE'],     -- Columns you want to perform aggregate function on

      'where_clause', 'p.HEM=c.HEM'   -- A boolean filter
                                      -- $$ is used to pass string literal

    )
);
Copy

For each of the columns referred to in either the dataset filtering “where_clause”, or the dimensions or measure_columns, you can use p. to refer to fields in provider tables, and c. to refer to fields in consumer tables. Use p2, p3, etc. for more than one provider table and c2, c3, etc. for more than one consumer table.

How to determine the inputs to run_analysis

To run the analysis, you need to pass in some parameters to the run_analysis function. This section shows you how to determine what parameters to pass in.

Template names

First, you can see the supported analysis templates by calling the following procedure.

call samooha_by_snowflake_local_db.consumer.view_added_templates($cleanroom_name);
Copy

Before running an analysis with a template, you need to know what arguments to specify and what types are expected. For custom templates, you can execute the following.

Note

Note that all system-defined preset templates are encrypted and aren’t viewable by default. Any custom templates that you add will be visible, however.

call samooha_by_snowflake_local_db.consumer.view_template_definition($cleanroom_name, 'prod_overlap_analysis');
Copy

This can often also contain a large number of different SQL Jinja parameters. The following functionality parses the SQL Jinja template and extracts the arguments that need to be specified in run_analysis into a list.

Note

Note that all system-defined preset templates are encrypted, and so this function will not get the arguments for these templates. You will be able to retrieve the parameters for your custom templates, however.

call samooha_by_snowflake_local_db.consumer.get_arguments_from_template($cleanroom_name, 'prod_overlap_analysis');
Copy

Dataset names

If you want to view the dataset names that have been added to the clean room by the provider, call the following procedure. Note that you cannot view the data present in the datasets that have been added to the clean room by the provider due to the security properties of the clean room.

call samooha_by_snowflake_local_db.consumer.view_provider_datasets($cleanroom_name);
Copy

You can also see the tables you’ve linked to the clean room by using the following call:

call samooha_by_snowflake_local_db.consumer.view_consumer_datasets($cleanroom_name);
Copy

Dimension and measure columns

While running the analysis, you might want to filter, group by and aggregate on certain columns. If you want to view the column policy that has been added to the clean room by the provider, call the following procedure.

call samooha_by_snowflake_local_db.consumer.view_provider_column_policy($cleanroom_name);
Copy

Epsilon and privacy budgets

If you are getting an error as a result of the last execution procedure, it might be because there is no budget left for such high epsilon that you have chosen. You can check the remaining privacy budget using the below procedure.

call samooha_by_snowflake_local_db.consumer.view_remaining_privacy_budget($cleanroom_name);
Copy

The epsilon parameter that you specify is an input to the Differential Privacy mechanism operating inside the clean room. See the Differential Privacy section for more on how Differential Privacy works. The higher the value of epsilon you specify, the more of the finite privacy budget (reset daily) you consume, but the higher the accuracy of the result since less noise is added to the aggregated data.

Common errors

If you are getting Not approved: unauthorized columns used error as a result of run analysis, you might want to view the join policy and column policy set by the provider again.

call samooha_by_snowflake_local_db.consumer.view_provider_join_policy($cleanroom_name);
call samooha_by_snowflake_local_db.consumer.view_provider_column_policy($cleanroom_name);
Copy

It is also possible that you have exhausted your privacy budget, which prevents you from executing more queries. Your remaining privacy budget can be viewed using the below command. It resets daily, or the clean room provider can reset it if they wish.

call samooha_by_snowflake_local_db.consumer.view_remaining_privacy_budget($cleanroom_name);
Copy

You can check if Differential Privacy has been enabled for your clean room using the following API:

call samooha_by_snowflake_local_db.consumer.is_dp_enabled($cleanroom_name);
Copy