Understanding Column-level Security

This topic provides a general overview of column-level security and describes the features and support that are common to both Dynamic Data Masking and External Tokenization.

What is Column-level Security?

Column-level security in Snowflake allows the application of a masking policy to a column within a table or view. Currently, column-level security includes two features:

  1. Dynamic Data Masking

  2. External Tokenization

Dynamic Data Masking is a column-level security feature that uses masking policies to selectively mask plain-text data in table and view columns at query time.

External Tokenization enables accounts to tokenize data before loading it into Snowflake and detokenize the data at query runtime. Tokenization is the process of removing sensitive data by replacing it with an undecipherable token. External Tokenization makes use of masking policies with external functions.

What are Masking Policies?

Snowflake supports masking policies as a schema-level object to protect sensitive data from unauthorized access while allowing authorized users to access sensitive data at query runtime. This means that sensitive data in Snowflake is not modified in an existing table (i.e. no static masking). Rather, when users execute a query in which a masking policy applies, the masking policy conditions determine whether unauthorized users see masked, partially masked, obfuscated, or tokenized data. Masking policies as a schema-level object also provide flexibility in choosing a centralized, decentralized, or hybrid management approach. For more information, see Managing Column-level Security (in this topic).

Masking policies can include conditions and functions to transform the data at query runtime when those conditions are met. The policy-driven approach supports segregation of duties to allow security teams to define policies that can limit sensitive data exposure, even to the owner of an object (i.e. the role with the OWNERSHIP privilege on the object, such as a table or view) who normally have full access to the underlying data.

The masking policy admin can apply masking policies to tables and views.

For example, masking policy administrators can implement a masking policy such that analysts (i.e. users with the custom ANALYST role) can only view the last four digits of a phone number and none of the social security number, while customer support representatives (i.e. users with the custom SUPPORT role) can view the entire phone number and social security number for customer verification use cases.

Masking policy results for authorized and unauthorized roles.

A masking policy consists of a single data type, one or more conditions, and one or more masking functions.

While Snowflake offers secure views to restrict access to sensitive data, secure views present management challenges due to large numbers of views and derived business intelligence (BI) dashboards from each view. Masking policies solve this management challenge by avoiding an explosion of views and dashboards to manage.

Masking policies support segregation of duties (SoD) through the role separation of policy administrators from object owners. Secure views do not have SoD, which is a profound limitation to their utility. This role separation leads to the following default settings:

  • Object owners (i.e. the role that has the OWNERSHIP privilege on the object) do not have the privilege to unset masking policies.

  • Object owners cannot view column data in which a masking policy applies.

The masking policy administrator can grant privileges to object owners or other roles to perform these actions.

How Does a Masking Policy Work?

Masking policies for Dynamic Data Masking and External Tokenization adopt the same structure and format with one notable exception: masking policies for External Tokenization require using External Functions in the masking policy body.

The reason for this exception is that External Tokenization requires a third-party tokenization provider to tokenize data before loading data into Snowflake. At query runtime, Snowflake uses the external function to make a REST API call to the tokenization provider, which then evaluates a tokenization policy (that is created outside of Snowflake) to return either tokenized or detokenized data based on the masking policy conditions. Note that role mapping must exist in Snowflake and the tokenization provider to ensure that the correct data can be returned from a given query.

Snowflake supports creating masking policies using CREATE MASKING POLICY. For example:

-- Dynamic Data Masking

create masking policy employee_ssn_mask as (val string) returns string ->
  case
    when current_role() in ('PAYROLL') then val
    else '******'
  end;

-- External Tokenization

  create masking policy employee_ssn_detokenize as (val string) returns string ->
  case
    when current_role() in ('PAYROLL') then ssn_unprotect(val)
    else val -- sees tokenized data
  end;

Where:

  • employee_ssn_mask

    The name of the Dynamic Data Masking policy.

  • employee_ssn_detokenize

    The name of the External Tokenization policy.

  • as (val string) returns string

    Specifies the input and output data types. The data types must match.

  • ->

    Do not replace the arrow in the masking policy statement with any other character(s).

  • case ... end;

    Specifies the masking policy body (i.e. SQL expression) conditions.

    In these two examples, if the query operator is using the PAYROLL custom role in the current session, the operator sees the unmasked/detokenized value. Otherwise, a fixed masked/tokenized value is seen.

  • ssn_unprotect

    The external function to operate on the tokenized data.

For more details on using masking policies, see:

Masking Policies at Query Runtime

Snowflake supports nested masking policies, such as a masking policy on a table and a masking policy on a view for the same table. At query runtime, Snowflake evaluates all masking policies that are relevant to a given query in the following sequence:

  • The masking policy that is applicable to the table is always executed first.

  • The policy for the view is executed after evaluating the policy for the table.

  • If nested views exist (e.g. Table 1 -> View 1 -> View 2 -> … View n), the policies are applied in sequential order from left to right.

This pattern continues for however many masking policies exist with respect to the data in the query. The following diagram illustrates the relationship between a query performer, tables, views, and policies.

Masking policies with tables and views.

At runtime, Snowflake rewrites the query to apply the masking policy expression to the columns specified in the masking policy. The masking policy is applied to the column regardless of where in a SQL expression the column is referenced, including:

  • Projections.

  • JOIN predicates.

  • WHERE clause predicates.

  • ORDER BY and GROUP BY clauses.

Important

A masking policy is deliberately applied wherever the relevant column is referenced by a SQL construct to prevent the de-anonymization of data through creative queries to include masked column data.

Therefore, if executing a query results in masked data in one or more columns, the query output may not provide the anticipated value because the masked data prevents evaluating all of the query output data in the desired context.

The following example shows a user-submitted query followed by the Snowflake runtime query rewrite in which the masking policy (i.e. <SQL_expression>) applies to the email column only.

Simple query:

SELECT email, city from customers where city = 'San Mateo';

SELECT <SQL_expression>(email), city from customers where city = 'San Mateo';

The following examples show a user-submitted query followed by the Snowflake runtime query rewrite in which the masking policy (i.e. <SQL_expression>) applies to only one side of a comparison (e.g., the email column but not the string to which the email column is compared). The results of the query are not what the user intended. Masking only one side of a comparison is a common anti-pattern.

Query with a protected column in the WHERE clause predicate (anti-pattern):

SELECT email from customers where email = 'user@example.com';

SELECT <SQL_expression>(email) from customers where <SQL expression>(email) = 'user@example.com';

Query with a protected column in the JOIN predicate (anti-pattern):

select b.email, d.city from sf_tuts.public.emp_basic as b join sf_tuts.public.emp_details as d on b.email = d.email;

select <SQL_expression>(b.email), d.city from sf_tuts.public.emp_basic as b join sf_tuts.public.emp_details as d on <SQL expression>(b.email) = <SQL expression>(d.email);

Query Runtime Considerations

Snowflake recommends considering the following factors when trying to predict the effect of applying a masking policy to a column and whether the query operator sees masked data:

The current session

Masking policy conditions using CURRENT_ROLE.

The executing role

Masking policy conditions using INVOKER_ROLE.

Role hierarchy

Masking policy conditions using IS_ROLE_IN_SESSION or IS_GRANTED_TO_INVOKER_ROLE.

Data sharing

Whether the data is shared using Snowflake Secure Data Sharing. See Limitations (in this topic).

Replication

See Replicated Database Objects.

The first three items are explained in greater detail in Advanced Column-level Security Topics. Data sharing only applies to Dynamic Data Masking because external functions cannot be invoked in the context of a share.

Ultimately, the specific use case determines whether Dynamic Data Masking or External Tokenization is the best fit.

Choosing Dynamic Data Masking or External Tokenization

To choose the correct feature that best meets the need of your organization, evaluate the major use cases for your data along with relevant considerations and limitations. The following two sections summarize the benefits and limitations between the two features.

Benefits

The following table compares the benefits of Dynamic Data Masking and External Tokenization.

Factor

Dynamic Data Masking

External Tokenization

Notes

Pre-load tokenized Data.

Unauthorized users never see the real data value. Requires third-party tokenization provider.

Pre-load unmasked data.

Only need built-in Snowflake functionality, no third-parties required.

Data Sharing.

External functions cannot be used with Data Sharing, therefore no External Tokenization.

Ease of use and Change management.

Write a policy once and have it apply to thousands of columns across databases and schemas.

Data administration and SoD.

A security or privacy officer decides which columns to protect, not the object owner. . Masking policies are easy to manage and support centralized and decentralized administration models.

Data authorization and governance.

Contextual data access by role or custom entitlements.

Supports data governance as implemented by security or privacy officers and can prohibit privileged users with the ACCOUNTADMIN or SECURITYADMIN role from unnecessarily viewing data.

Database Replication

See: Replicated Database Objects.

Limitations

The following table describes the current limitations for column-level security. A checkmark (i.e. ✔) indicates a limitation or lack of current support for the feature.

Limitation

Dynamic Data Masking

External Tokenization

Notes

Shared objects.

For Dynamic Data Masking, as a data sharing consumer, you cannot apply a masking policy to a shared database or table. . As a workaround, import the shared database or table and apply the masking policy to a local view on that shared database or table. . For External Tokenization, Secure Data Sharing is not applicable because external functions cannot be invoked in the context of a share.

Materialized views (MV).

The following actions are not supported: associating a masking policy on an MV column, creating an MV from a table that has a masking policy on one or more of its columns, and associating a masking policy to a table column if there is a MV on that table.

DROP MASKING POLICY

Prior to dropping a policy, unset the policy from the table or view column using ALTER TABLE … ALTER COLUMN or ALTER VIEW.

DROP DATABASE and DROP SCHEMA

Requires the masking policy and its mappings to be self-contained within a database and schema. . For example, database_1 contains policy_1 and policy_1 is only used in database_1.

Virtual columns

Masking policies cannot be applied to virtual columns. Apply the policy to the source table column or view column.

Multiple columns in a single statement.

Only one column per ALTER TABLE or ALTER VIEW statement. Execute a single ALTER statement on a column to set or unset a column-level security masking policy.

Masking policy change management.

You can optionally store and track masking policy changes in a version control system of your choice.

Microsoft Azure cloud platform

Support for External Tokenization on the Microsoft Azure cloud platform is planned.

Google Cloud Platform

Support for External Tokenization on Google Cloud Platform is planned.

Using Column-level Security on Tables and Views

Snowflake supports masking policies with tables and views. The following describes how masking policies affect tables and views in Snowflake.

Apply Masking Policies to Columns

To apply masking policy policies to columns in tables or views, execute the following statements:

-- table
ALTER TABLE IF EXISTS user_info MODIFY COLUMN ssn_number SET MASKING POLICY ssn_mask;

-- view
ALTER VIEW user_info_v MODIFY COLUMN ssn_number SET MASKING POLICY ssn_mask_v;

For more information on syntax and usage, see ALTER TABLE … ALTER COLUMN and ALTER VIEW.

Obtain Columns with a Masking Policy

To obtain a list of columns with masking policies, execute the following statement. For more information, see POLICY_REFERENCES.

SELECT * from table(information_schema.policy_references(policy_name=>'<policy_name>'));

Execute a DESCRIBE TABLE or DESCRIBE VIEW statement to view the masking policy on column in a table or view.

External Tables

You can apply the masking policy to the external table VALUE column by executing an ALTER EXTERNAL TABLE statement on the external table.

ALTER EXTERNAL TABLE <name> MODIFY COLUMN VALUE SET MASKING POLICY <policy_name>;

You cannot apply the masking policy to a virtual column directly. Instead, create a view on the external table and apply a masking policy to the columns on the view.

Streams

Masking policies on columns in a table carry over to a stream on the same table.

The result is that unauthorized users see masked data; streams created by authorized users see the data as defined by the masking policy.

CLONED Objects

The following approach helps to safeguard data from users with the SELECT privilege when accessing a cloned object.

  • Cloning an individual masking policy object is not supported.

  • Cloning a schema results in the cloning of all masking policies within the schema.

  • A cloned table maps to the same masking policies as the source table.

    • When a table is cloned in the context of its parent schema cloning, if the source table has a reference to a masking policy in the same parent schema (i.e. a local reference), the cloned table will have a reference to the cloned masking policy.

    • If the source table refers to a masking policy in a different schema (i.e. a foreign reference), then the cloned table retains the foreign reference.

For more information, see CREATE <object> … CLONE.

CREATE TABLE … AS SELECT (CTAS) Statements

Executing a CREATE TABLE … AS SELECT (CTAS) statement applies any masking policies on columns included in the statement before the data is populated in the new table (i.e. the applicable column data is masked in the new table). This flow is adhered to because a table created using a CTAS statement may have a different set of columns than the source objects, and Snowflake cannot apply masking policies to the new table columns implicitly.

If there is a need to copy unmasked data, use a role authorized for protected data to run the CTAS statement. After creating the new table, transfer ownership of the new table to another role and ask the masking policy administrator to apply the masking policies to the columns of the new table.

For more information, see CREATE TABLE.

Queries using Aggregate Functions and Masked Columns

It is possible to use Aggregate Functions on columns with masked data.

A representative use case is that a data analyst wants to obtain the COUNT for a column of social security numbers without needing to see the actual data. However, if the data analyst runs a query using SELECT on a masked table column, the query returns a fixed masked value. Users with the PAYROLL custom role in the current session see the unmasked data and everyone else sees masked data.

To achieve this outcome:

  1. The table owner creates a view on the column that contains the aggregate function.

    CREATE VIEW ssn_count AS SELECT DISTINCT(ssn) FROM table1;
    
  2. Grant the ANALYST role full privileges on the view. Do not grant the analyst any privileges on the table.

  3. Apply a masking policy to the table column. Note that the table policy is always applied before the view policy, if there is a policy on a view column.

    case
      when current_role() in ('PAYROLL') then val
      else '***MASKED***'
    end;
    
  4. Execute a query on the view column.

    use role ANALYST;
    select count(DISTINCT SSN) from <view>;
    

Query Profile

When used on a column with a masking policy, the EXPLAIN command output includes the masked data, not the masking policy body.

The following example generates the EXPLAIN plan for a query on a table of employee identification numbers and social security numbers. The command in this example generates the example in JSON format.

The column containing the social security numbers has a masking policy.

explain using json select * from mydb.public.ssn_record;
{
  "GlobalStats": {
    "partitionsTotal": 0,
    "partitionsAssigned": 0,
    "bytesAssigned": 0
  },
  "Operations": [
    [
      {
        "id": 0,
        "operation": "Result",
        "expressions": [
          "1",
          "'**MASKED**'"
        ]
      },
      {
        "id": 1,
        "parent": 0,
        "operation": "Generator",
        "expressions": [
          "1"
        ]
      }
    ]
  ]
}

Unloading Data

Using the COPY INTO <location> command on a column that has a masking policy results in the masking policy being applied to the data. Therefore, unauthorized users see masked data after executing the command.

Managing Column-level Security

This section provides information useful for determining your overall management approach to masking policies, describes the privileges required to manage column-level security, and lists supported DDL commands.

Choosing a Centralized, Hybrid, or Decentralized Approach

To manage Dynamic Data Masking and External Tokenization policies effectively, it is helpful to consider whether your approach to masking data in columns should follow a centralized security approach, a decentralized approach, or a hybrid of each of these two approaches.

The following table summarizes some of the considerations with each of these two approaches.

Policy Action

Centralized Management

Hybrid Management

Decentralized Management

Create policies

Security officer

Security officer

Individual teams

Apply policies to columns

Security officer

Individual teams

Individual teams

As a best practice, Snowflake recommends that your organization gathers all relevant stakeholders to determine the best management approach for implementing column-level security in your environment.

Column-level Security Privileges

This section describes the column-level security masking policy privileges and how they apply to a centralized, decentralized, or hybrid management approach.

Snowflake provides the following privileges for column-level security masking policies.

Privilege

Usage

APPLY

Enables applying the policy [un]set operations for the masking policy.

OWNERSHIP

Transfers ownership of a masking policy, which grants full control over the masking policy. Required to alter most properties of a masking policy.

Note

Operating on a masking policy also requires the USAGE privilege on the parent database and schema.

The following examples show how granting privileges apply to different management approaches. After granting the APPLY privilege to a role, the masking policy can be set on a table column using an ALTER TABLE … ALTER COLUMN statement or set on a view column using an ALTER VIEW statement (by a member of the role with the APPLY privilege on the masking policy).

Centralized Management

In a centralized management approach, only the security officer custom role (e.g. SECURITY_OFFICER) creates and applies masking policies to columns in tables or views. This approach can provide the most consistency in terms of masking policy management and masking sensitive data.

-- create a SECURITY_OFFICER custom role

use role accountadmin;
create role security_officer;

-- grant CREATE AND APPLY masking policy privileges to the SECURITY_OFFICER custom role.

grant create masking policy on schema <schema_name> to role security_officer;

grant apply masking policy on account to role security_officer;

Where:

  • schema_name

    Specifies the identifier for the schema; must be unique for the database in which the schema is created.

    In addition, the identifier must start with an alphabetic character and cannot contain spaces or special characters unless the entire identifier string is enclosed in double quotes (e.g. “My object”). Identifiers enclosed in double quotes are also case-sensitive.

    For more details, see Identifier Requirements.

Hybrid Management

In a hybrid management approach, the security officer custom role (e.g. SECURITY_OFFICER) creates masking policies and individual teams (e.g. finance, payroll, human resources) apply the masking policies to columns in tables or views owned by the teams. This approach can lead to consistent policy creation and maintenance but requires individual teams to have the increased responsibility to mask sensitive data.

-- create a SECURITY_OFFICER custom role

use role accountadmin;
create role security_officer;

-- grant the CREATE masking policy privilege to the SECURITY_OFFICER role.

grant create masking policy on schema <schema_name> to role security_officer;

The SECURITY_OFFICER custom role grants the APPLY privilege to the human resources team (i.e. users with the HUMAN_RESOURCES custom role) to mask social security numbers (e.g. masking policy: ssn_mask) in columns for objects owned by the HUMAN_RESOURCES custom role.

grant apply on masking policy ssn_mask to role human_resources;

Where:

  • grant apply on masking policy policy_name to role role_name;

    Used by a policy owner to decentralize the unset and set operations of a given masking policy on columns to the object owners.

    This privilege supports discretionary access control where object owners are also considered data stewards.

Decentralized Approach

In a decentralized management approach, individual teams create and apply masking policies to columns in tables or views. This approach can lead to inconsistent policy management, with the possibility of sensitive data not being masked properly, since individual teams assume all responsibility for managing masking policies and masking sensitive data.

In this representative example, the support team (i.e. users with the custom role SUPPORT) and the finance team (i.e. users with the custom role FINANCE) can create masking policies. Note that these custom roles may not include the SECURITY_OFFICER custom role.

use role accountadmin;

-- grant masking policy privileges to the SUPPORT custom role.

grant create masking policy on schema <schema_name> to role support;

-- grant masking policy privileges to the FINANCE custom role.

grant create masking policy on schema <schema_name> to role finance;

The support team grants the APPLY privilege to the human resources team (i.e. users with the custom role HUMAN_RESOURCES) to mask social security numbers (e.g. masking policy: ssn_mask) in columns for objects owned by the HUMAN_RESOURCES custom role.

use role support;
grant apply on masking policy ssn_mask to role human_resources;

The finance team grants the APPLY privilege to the internal audit team (i.e. users with the custom role AUDIT_INTERNAL) to mask cash flow data (e.g. masking policy: cash_flow_mask) in columns for objects owned by the AUDIT_INTERNAL custom role.

use role finance;
grant apply on masking policy cash_flow_mask to role audit_internal;

For more information on masking policy privileges, see:

Column-level Security DDL

Snowflake provides the following set of commands to manage column-level security policies.

The following table summarizes the relationship between the column-level security DDL operations and their necessary privileges.

Operation

Privilege required

Create masking policy

A role with the CREATE MASKING POLICY on SCHEMA privilege.

Alter masking policy

The masking policy owner (i.e. the role with the OWNERSHIP privilege on the masking policy).

Drop masking policy

The masking policy owner (i.e. the role with the OWNERSHIP privilege on the masking policy).

Show masking policies

One of the following: . A role with the global APPLY MASKING POLICY privilege, or . The masking policy owner (i.e. the role with the OWNERSHIP privilege on the masking policy).

Describe masking policy

One of the following: . A role with the global APPLY MASKING POLICY privilege or . The masking policy owner (i.e. the role with the OWNERSHIP privilege on the masking policy).

SET/UNSET masking policy on columns

Either: . A role with the global APPLY MASKING POLICY privilege, or . A role with APPLY on MASKING POLICY privilege on a given masking policy and has OWNERSHIP on the target table/view.

List of columns having masking policy

One of the following: . The role with the APPLY MASKING POLICY privilege, or . The role with the APPLY on MASKING POLICY privilege on a given masking policy and has OWNERSHIP on the target object.

Using UDFs in a masking policy

If creating a new or altering an existing masking policy, the policy administrator role must have usage on the UDF, all scalar UDFs in the policy expression should have the same data type, and the UDF must exist. . At the query runtime, Snowflake verifies if the UDF exists; if not, the SQL expression will not resolve and the query fails.

Next Topics: