Sensitive data classification¶
This topic provides information on how sensitive data classification works.
For information on how to use custom classifiers, see Custom sensitive data classification.
Overview¶
Sensitive data classification is a multi-step process that associates Snowflake-defined system tags to columns by analyzing the fields and metadata for personal data; this data can be tracked by a data engineer using SQL and Snowsight. A data engineer can classify columns in a table to determine whether the column contains certain kinds of data that need to be tracked or protected, such a unique identifier (passport or bank account data), a quasi-identifier (the city in which the individual lives), or a sensitive value (the salary of an individual).
By tracking the data with a system tag and protecting the data by using a masking or row access policy, the data engineer can improve the governance posture associated with the data. The overall result of the classification and data protection steps is to facilitate compliance with data privacy regulations.
You can classify a single table or tables in a schema. Snowflake provides predefined system tags to enable you to classify and tag columns, or you can use custom classifiers to define your own semantic category based on your knowledge of your data. You can also choose an approach the uses Snowflake system tags and custom classifiers depending on the governance posture that you wish to adopt.
Classification provides the following benefits to data privacy and data governance administrators:
- Data access:
The results of classifying column data can inform identity and access management administrators to evaluate and maintain their Snowflake role hierarchies to ensure the Snowflake roles have the appropriate access to sensitive or PII data.
- Data sharing:
The classification process can help to identify and confirm the storage location of PII data. Subsequently, a data sharing provider can use the classification results to determine whether to share data and how to make the PII data available to a data sharing consumer.
- Policy application:
The usage of columns containing PII data, such as referencing columns in base tables to create a view or materialized view, can help to determine the best approach to protect the data with either a masking policy or a row access policy.
Supported objects and data types¶
Snowflake supports classifying data stored in all types of tables and views, including external tables, materialized views, and secure views.
Note that Snowflake does not support classification on shared tables and shared schemas from the consumer’s side. If a table is created by the provider and placed into the provider’s outbound share, the classification will work only if it is called from the provider’s side.
You can classify table and view columns for all supported data types except for the following data types:
ARRAY
BINARY
GEOGRAPHY
OBJECT
VARIANT
Note that you can classify a column with the VARIANT data type when the column data type can be cast to a NUMBER or STRING data type. Snowflake does not classify the column if the column contains JSON, XML, or other semi-structured data.
VECTOR
If a table contains columns that are not of a supported data type or the column contains all NULL values, the classification process ignores the columns and does not include them in the output.
Important
If your data represents NULL values with a value other than NULL, the accuracy of the classification results may be impacted.
Recommendations¶
To capitalize on the Classification feature and optimize your PII data tracking capabilities, do the following:
- Validation:
Query Account Usage views first:
ACCESS_HISTORY: determine the table and view objects that are accessed most frequently.
OBJECT_DEPENDENCIES: determine metadata references between two or more objects.
Use the query results to prioritize schema-level or database-level assignment of the Classification system tags.
- Column names:
Use sensible column names in your table objects and train table creators to adhere to internal table creation guidelines.
- Data types:
Use sensible data types for columns. For example, an AGE column should have the NUMBER data type.
- VARIANT:
If a column has a VARIANT data type, use the FLATTEN command on the column prior to classifying the table.
Classify tables and schemas¶
After you’ve defined any custom classifiers that you want to use, you’re ready to classify your sensitive data. You can use the following approaches:
Manually classify a specific table. You can start the classification process using Snowsight or by executing a SQL command.
Set up a classification profile for a schema so the tables in the schema are automatically classified.
Manage sensitive data classification¶
Privilege reference¶
The privilege model for Data Classification enables the data privacy administrator to determine which personas can classify tables and tag columns. For example, a single role can have all of the necessary privileges, or the data privacy administrator can delegate grants to different roles to satisfy separation of duties (SoD) requirements. One example of a viable grant combination is shown in the Get started classifying data section of Classify sensitive data manually.
As an administrator, you have different options depending on how you want to manage which roles or personas are involved. The options provide flexibility in the governance posture that you wish to adopt. For example:
The table owner (the role with the OWNERSHIP privilege on the table) can classify the table and set system tags on the columns.
A custom role that has the SELECT privilege on the table and the APPLY TAG privilege on the account can classify the table and set system tags on the columns.
If you want different roles or personas to be involved with classifying and tagging columns, you could grant the SELECT privilege on the table to one role and the APPLY TAG privilege on the account to a different role.
This following table summarizes the different grant options to classify a table, set the Data Classification system tags on columns, and do both of these tasks:
Privilege or role |
Classify table(s) |
Set system tags on columns |
---|---|---|
SELECT on the table or view. |
✔ |
|
OWNERSHIP on the table. |
✔ |
✔ |
APPLY TAG on the account. |
✔ |
|
ACCOUNTADMIN role. |
✔ |
|
OWNERSHIP on the database or schema. |
Important
Classifying tables requires a running warehouse. The role that is used to classify a table must have the USAGE privilege on a warehouse at a minimum.
You can grant the SNOWFLAKE.GOVERNANCE_VIEWER database role to an account role to enable users with that account role to query the DATA_CLASSIFICATION_LATEST view to see the most recent results of a classified table.