Free-form SQL queries

A data provider can allow their data to be exposed to an analysis runner via a template or free-form queries. When a data provider enables free-form queries on a dataset, any analysis runners with access to the data offering can run SQL queries in their environment against that dataset.

Analysis runners and data providers must both have joined the collaboration before the data becomes available.

Overview

Here are the steps to run free-form queries against data in a clean room:

Data provider

  1. Register a data offering that contains one or more datasets where allowed_analyses: template_and_freeform_sql is specified.

    If the data provider wants to apply Snowflake policies to columns in the dataset, they must create those policies before registering the data, and associate the policies with the columns in the data offering specification.

  2. Link the data offering into the collaboration in the standard way.

Analysis runner

After the collaboration is installed on their account, the analysis runner calls VIEW_DATA_OFFERINGS. If there is a value in the freeform_sql_view_name column, the dataset can be queried directly against the view named in that column.

Any policies listed in freeform_sql_column_policies are applied to the data by the collaboration. Any policies applied directly to the source data by the data provider are enforced, but won’t be shown in that column.

Details about the data provider and analysis steps are given in the following sections.

Registering a free-form query dataset (Data Provider)

The following steps show how to enable free-form queries during data offering registration:

  1. Specify allowed_analyses: template_and_freeform_sql in the collaboration specification. This enables the dataset to be queried using either a template or free-form query.

    ...
    datasets:
    - alias: customers_view
      data_object_fqn: PROVIDER_DB.DATA_SCH.CUSTOMERS
      object_class: custom
      allowed_analyses: template_and_freeform_sql
      schema_and_template_policies:
        HASHED_EMAIL:
          category: join_standard
          column_type: hashed_email_b64_encoded
    ...
    

    Only the columns listed under schema_and_template_policies are available for querying via templates or free-form queries.

  2. If you want to apply Snowflake policies in free-form queries without applying them to your source data, take the following steps:

    1. Create your Snowflake policies in the standard way. Don’t apply them to your table.

      CREATE OR REPLACE AGGREGATION POLICY PROVIDER_DB.DATA_SCH.MIN_GROUP_SIZE_POLICY
        AS () RETURNS AGGREGATION_CONSTRAINT ->
          AGGREGATION_CONSTRAINT(MIN_GROUP_SIZE => 5);
      

      The role that creates the collaboration must have the USAGE privilege on the database, schema, and policy object.

      These policies are linked dynamically; any changes that you make to these policies immediately affect any datasets that use those policies, even if the data offering is already registered and linked.

    2. Assign your policies in the data offering specification under the freeform_sql_policies field. Important: All column names used under freeform_sql_policies must use the auto-generated column name if the column has been renamed. Renaming affects only join-standard category columns.

      These policies aren’t applied directly to the source table, only to the view registered by the collaboration.

      schema_and_template_policies:
        HASHED_EMAIL:                                  # Source column name.
          category: join_standard
          column_type: hashed_email_b64_encoded        # Column is renamed to the column_type value.
        STATUS:
          category: passthrough
        AGE_BAND:
          category: passthrough
        DAYS_ACTIVE:
          category: passthrough
        INCOME_BRACKET:
          category: passthrough
      freeform_sql_policies:          # Apply agg, join, and masking policies created by the data owner to these columns.
        aggregation_policy:
          name: PROVIDER_DB.DATA_SCH.MIN_GROUP_SIZE_POLICY
          entity_keys:
            - HASHED_EMAIL_B64_ENCODED
        join_policy:
          name: PROVIDER_DB.DATA_SCH.EMAIL_JOIN_POLICY
          columns:
            - HASHED_EMAIL_B64_ENCODED    # This is the renamed column.
        masking_policies:
          - name: PROVIDER_DB.DATA_SCH.MASK_INCOME_POLICY
            columns:
              - INCOME_BRACKET
      
  3. Register the data offering in the standard way by calling REGISTER_DATA_OFFERING.

Running free-form queries (Analysis Runner)

When an analysis runner calls VIEW_DATA_OFFERINGS, if a value appears in the freeform_sql_view_name column, the free-form SQL view can be queried directly, without using a template. All Snowflake policies applied to the source table or defined in the data offering’s freeform_sql_policies section are enforced in the queries.

CALL SAMOOHA_BY_SNOWFLAKE_LOCAL_DB.COLLABORATION.VIEW_DATA_OFFERINGS($collaboration_name);

Column

Value

TEMPLATE_VIEW_NAME

data_provider.provider_customers_V1.customers

TEMPLATE_JOIN_COLUMNS

hashed_email_b64_encoded

ANALYSIS_ALLOWED_COLUMNS

STATUS, AGE_BAND, DAYS_ACTIVE, INCOME_BRACKET

ACTIVATION_ALLOWED_COLUMNS

FREEFORM_SQL_VIEW_NAME

SFDCR_FREEFORM_SQL_DEMO.FREEFORM_SQL.DATA_PROVIDER_PROVIDER_CUSTOMERS_V1_CUSTOMERS

FREEFORM_SQL_COLUMN_POLICIES

{
  "aggregation_policy": {"entity_keys": ["HASHED_EMAIL_B64_ENCODED"]},
  "masking_policy": {"columns": ["INCOME_BRACKET"]},
  "join_policy": {"columns": ["HASHED_EMAIL_B64_ENCODED"]},
  "no_policy": {"columns": ["DAYS_ACTIVE", "AGE_BAND", "STATUS"]}
}

SHARED_BY

data_provider

SHARED_WITH

["data_consumer"]

DATA_OFFERING_ID

provider_customers_V1

You must use the value from freeform_sql_view_name, not the value from template_view_name.

SELECT status, COUNT(*) AS customer_count
  FROM SFDCR_FREEFORM_SQL_DEMO.FREEFORM_SQL.DATA_PROVIDER_PROVIDER_CUSTOMERS_V1_CUSTOMERS AS t
  GROUP BY status
  ORDER BY customer_count DESC;

Example: Two-party collaboration

The following example demonstrates a two-party collaboration, where one party (the “provider”) is the collaboration owner and a data provider for the consumer. The other party (the “consumer”) is an analysis runner who can run the template and use the data provided by the provider, and also run free-form SQL queries on the data, subject to the policies defined in the data provider’s specification.

To run this example, you must have two separate accounts with Snowflake Data Clean Rooms installed.

You can either download the files and upload them to your Snowflake account, or copy and paste the example code into worksheets in two separate accounts by using Snowsight.

Download the source SQL files, and then upload them into two separate accounts that have Snowflake Data Clean Rooms installed: