Introduction to sensitive data classification¶
It’s critical to know where your sensitive data resides and if it’s adequately protected. This isn’t just a best practice; it’s a vital requirement across many industries to maintain compliance with regulations. Snowflake provides a solution that automatically discovers sensitive data and makes it easy to apply governance controls like tags and masking policies.
Snowflake classifies sensitive data into native categories like name and national identifier, but you can also create your own custom categories to detect sensitive data that is specific to your organization or domain.
Get started¶
Snowflake provides a web interface to configure sensitive data classification and to view the governance posture of sensitive data.
To get started, do one of the following:
To set up sensitive data classification, see Use the Trust Center to set up sensitive data classification.
To view the results of sensitive data classification, see Use the Trust Center to view results.
Core concepts of sensitive data classification¶
About classification categories¶
With sensitive data classification, every column that is identified as containing sensitive data is assigned two categories: a semantic category and a privacy category.
A semantic category identifies the type of personal attribute. Snowflake provides native categories for common attributes such as names and addresses. If your sensitive data doesn’t fit into a native category, you can create a custom category for it.
A privacy category identifies the sensitivity of a personal attribute. It can be either IDENTIFIER, QUASI_IDENTIFIER, or SENSITIVE (a generic, non-identifier category for things such as salary).
About classification profiles¶
When you use the Trust Center web interface to specify classification settings, those settings are saved as a classification profile. This classification profile can be edited later to change the settings that control how data is classified. In the web interface, the classification profile also controls which databases are being classified with the profile’s settings.
You can also use SQL commands to create and modify a classification profile. If you are using SQL, associating the classification profile with a database to start the classification process is a separate step.
Protecting sensitive data¶
Snowflake provides the governance tools you need to track and protect your sensitive data.
You can configure the classification process so Snowflake automatically assigns system and user-defined tags to data that it classifies as sensitive. You can then track the data within your data estate by tracking the tags.
You can assign a masking policy to columns that contain sensitive data to selectively mask the data at query time.
You can combine tagging and masking policies to automatically mask data that is classified as sensitive. If you use tag-based masking to associate a masking policy with a user-defined tag, the data will be automatically masked when Snowflake applies the tag as part of the classification process. As new data is added to a database, the tag-based masking policies are automatically assigned to the columns that contain sensitive data.
Determine which databases are being classified¶
You can determine what data is being monitored for sensitive data classification by listing the databases that are associated with a classification profile. If a database is associated with a classification profile, all the tables and views in that database are being automatically classified according to the criteria defined in the profile.
To determine which databases are being classified:
Sign in to Snowsight as a user with the required privileges.
In the navigation menu, select Governance & security » Trust Center.
Select the Data Security tab.
Select the Dashboard tab.
Find the Databases monitored by classification tile. To list the databases being classified, select Monitored or Partially monitored.
Note
A database is partially monitored if someone used SQL to set a classification profile directly on a schema in the database rather than setting the profile at the database level.
Use the SYSTEM$SHOW_SENSITIVE_DATA_MONITORED_ENTITIES function to list the databases that are associated with a classification profile.
SELECT SYSTEM$SHOW_SENSITIVE_DATA_MONITORED_ENTITIES('DATABASE');
Cost considerations¶
Sensitive data classification consumes credits as it uses serverless compute resources to classify tables in the database. For more information about pricing for this consumption, see Table 5 in the Snowflake Service Consumption Table.
Note
Classifying views can cost more than classifying tables. The additional cost depends on the complexity of the query that created the view. Materialized views don’t incur these additional costs. By default, views are excluded from classification.
View costs in Snowsight¶
To explore the cost of sensitive data classification:
Sign in to Snowsight.
Switch to a role with access to cost and usage data.
In the navigation menu, select Admin » Cost management.
Select a warehouse to use to view the usage data. Snowflake recommends using an XS warehouse for this purpose.
Select Consumption.
From the Usage Type drop-down, select Compute.
From the Service Type drop-down, select Sensitive Data Classification.
Use SQL to query costs¶
You can query views in the ACCOUNT_USAGE and ORGANIZATION_USAGE schemas to determine how much was spent on automatically classifying sensitive data. To monitor credit consumption, query the following views:
- METERING_HISTORY view (ACCOUNT_USAGE)
Lets you retrieve the hourly cost of automatic classification by focusing on
SENSITIVE_DATA_CLASSIFICATIONin theSERVICE_TYPEcolumn. For example:SELECT service_type, start_time, end_time, entity_id, name, credits_used_compute, credits_used_cloud_services, credits_used, budget_id FROM SNOWFLAKE.ACCOUNT_USAGE.METERING_HISTORY WHERE service_type = 'SENSITIVE_DATA_CLASSIFICATION';
- METERING_DAILY_HISTORY view (ACCOUNT_USAGE and ORGANIZATION_USAGE)
Lets you retrieve the daily cost of automatic classification by focusing on
SENSITIVE_DATA_CLASSIFICATIONin theSERVICE_TYPEcolumn. For example:SELECT service_type, usage_date, credits_used_compute, credits_used_cloud_services, credits_used FROM SNOWFLAKE.ACCOUNT_USAGE.METERING_DAILY_HISTORY WHERE service_type = 'SENSITIVE_DATA_CLASSIFICATION';
- USAGE_IN_CURRENCY_DAILY (ORGANIZATION_USAGE)
Lets you retrieve the daily cost of automatic classification by focusing on
SENSITIVE_DATA_CLASSIFICATIONin theSERVICE_TYPEcolumn. Use this view to determine the cost in currency, not credits.
Supported objects¶
Snowflake supports classifying data stored in all types of Snowflake tables and views.
Note that Snowflake does not support classification on shared tables and shared schemas from the consumer’s side. If a table is created by the provider and placed into the provider’s outbound share, the classification only works if it is called from the provider’s side.
Supported data types¶
You can classify table and view columns for all supported data types except for the following data types:
ARRAY
BINARY
DECFLOAT
GEOGRAPHY
OBJECT
VARIANT (except when the column data type is cast to a NUMBER or STRING data type)
VECTOR
Note
Unstructured data like long text stored in columns is not supported.
JSON, XML, or other semi-structured data is not supported
Limitations and considerations¶
Classification profiles cannot be set on a reader account.
A classification profile cannot be set on more than 1,000 databases.
A classification profile cannot be directly set on more than 10,000 schemas.
A maximum of 100 million tables can be classified in a schema.
You cannot automatically classify a table if it has any of the following characteristics:
More than 10,000 columns.
A column with a name that has more than 255 characters.
A column with a name that includes the
$character.