Introduction to sensitive data classification¶

It’s critical to know where your sensitive data resides and if it’s adequately protected. This isn’t just a best practice; it’s a vital requirement across many industries to maintain compliance with regulations. Snowflake provides a solution that automatically discovers sensitive data and makes it easy to apply governance controls like tags and masking policies.

Snowflake classifies sensitive data into native categories like name and national identifier, but you can also create your own custom categories to detect sensitive data that is specific to your organization or domain.

Get started¶

Snowflake provides a web interface to configure sensitive data classification and to view the governance posture of sensitive data.

To get started, do one of the following:

Core concepts of sensitive data classification¶

About classification categories¶

With sensitive data classification, every column that is identified as containing sensitive data is assigned two categories: a semantic category and a privacy category.

  • A semantic category identifies the type of personal attribute. Snowflake provides native categories for common attributes such as names and addresses. If your sensitive data doesn’t fit into a native category, you can create a custom category for it.

  • A privacy category identifies the sensitivity of a personal attribute. It can be either IDENTIFIER, QUASI_IDENTIFIER, or SENSITIVE (a generic, non-identifier category for things such as salary).

About classification tags¶

A tag is a Snowflake object that can be assigned to a column. Snowflake uses the following system-defined tags to identify columns that it has classified as containing sensitive data.

  • SNOWFLAKE.CORE.SEMANTIC_CATEGORY: Tag used to identify the native or custom category of the data in a column.

  • SNOWFLAKE.CORE.PRIVACY_CATEGORY: Tag used to identify the privacy category of the data in a column.

You can map user-defined tags to system-defined classification tags. For example, you can set up a tag map so that every time the system tag SNOWFLAKE.CORE.SEMANTIC_CATEGORY = 'NAME' is applied to a column, the user-defined tag tag_db.sch.pii = 'Highly confidential' is also applied.

About classification profiles¶

When you use the Trust Center web interface to specify classification settings, those settings are saved as a classification profile. This classification profile can be edited later to change the settings that control how data is classified. In the web interface, the classification profile also controls which databases are being classified with the profile’s settings.

You can also use SQL commands to create and modify a classification profile. If you are using SQL, associating the classification profile with a database to start the classification process is a separate step.

Protecting sensitive data¶

Snowflake provides the governance tools you need to track and protect your sensitive data.

  • You can configure the classification process so Snowflake automatically assigns system and user-defined tags to data that it classifies as sensitive. You can then track the data within your data estate by tracking the tags.

  • You can assign a masking policy to columns that contain sensitive data to selectively mask the data at query time.

  • You can combine tagging and masking policies to automatically mask data that is classified as sensitive. If you use tag-based masking to associate a masking policy with a user-defined tag, the data will be automatically masked when Snowflake applies the tag as part of the classification process. As new data is added to a database, the tag-based masking policies are automatically assigned to the columns that contain sensitive data.

Determine which databases are being classified¶

You can determine what data is being monitored for sensitive data classification by listing the databases that are associated with a classification profile. If a database is associated with a classification profile, all the tables and views in that database are being automatically classified according to the criteria defined in the profile.

To determine which databases are being classified:

  1. Sign in to Snowsight as a user with the required privileges.

  2. In the navigation menu, select Governance & security » Trust Center.

  3. Select the Data Security tab.

  4. Select the Dashboard tab.

  5. Find the Databases monitored by classification tile. To list the databases being classified, select Monitored or Partially monitored.

Note

A database is partially monitored if someone used SQL to set a classification profile directly on a schema in the database rather than setting the profile at the database level.

Cost considerations¶

Sensitive data classification consumes credits as it uses serverless compute resources to classify tables in the database. For more information about pricing for this consumption, see Table 5 in the Snowflake Service Consumption Table.

Note

Classifying views can cost more than classifying tables. The additional cost depends on the complexity of the query that created the view. Materialized views don’t incur these additional costs. By default, views are excluded from classification.

View costs in Snowsight¶

To explore the cost of sensitive data classification:

  1. Sign in to Snowsight.

  2. Switch to a role with access to cost and usage data.

  3. In the navigation menu, select Admin » Cost management.

  4. Select a warehouse to use to view the usage data. Snowflake recommends using an XS warehouse for this purpose.

  5. Select Consumption.

  6. From the Usage Type drop-down, select Compute.

  7. From the Service Type drop-down, select Sensitive Data Classification.

Use SQL to query costs¶

You can query views in the ACCOUNT_USAGE and ORGANIZATION_USAGE schemas to determine how much was spent on automatically classifying sensitive data. To monitor credit consumption, query the following views:

METERING_HISTORY view (ACCOUNT_USAGE)

Lets you retrieve the hourly cost of automatic classification by focusing on SENSITIVE_DATA_CLASSIFICATION in the SERVICE_TYPE column. For example:

SELECT
  service_type,
  start_time,
  end_time,
  entity_id,
  name,
  credits_used_compute,
  credits_used_cloud_services,
  credits_used,
  budget_id
  FROM SNOWFLAKE.ACCOUNT_USAGE.METERING_HISTORY
  WHERE service_type = 'SENSITIVE_DATA_CLASSIFICATION';
Copy
METERING_DAILY_HISTORY view (ACCOUNT_USAGE and ORGANIZATION_USAGE)

Lets you retrieve the daily cost of automatic classification by focusing on SENSITIVE_DATA_CLASSIFICATION in the SERVICE_TYPE column. For example:

SELECT
  service_type,
  usage_date,
  credits_used_compute,
  credits_used_cloud_services,
  credits_used
  FROM SNOWFLAKE.ACCOUNT_USAGE.METERING_DAILY_HISTORY
  WHERE service_type = 'SENSITIVE_DATA_CLASSIFICATION';
Copy
USAGE_IN_CURRENCY_DAILY (ORGANIZATION_USAGE)

Lets you retrieve the daily cost of automatic classification by focusing on SENSITIVE_DATA_CLASSIFICATION in the SERVICE_TYPE column. Use this view to determine the cost in currency, not credits.

Supported objects¶

Snowflake supports classifying data stored in all types of Snowflake tables and views.

Note that Snowflake does not support classification on shared tables and shared schemas from the consumer’s side. If a table is created by the provider and placed into the provider’s outbound share, the classification only works if it is called from the provider’s side.

Supported data types¶

You can classify table and view columns for all supported data types except for the following data types:

  • ARRAY

  • BINARY

  • DECFLOAT

  • GEOGRAPHY

  • OBJECT

  • VARIANT (except when the column data type is cast to a NUMBER or STRING data type)

  • VECTOR

Note

  • Unstructured data like long text stored in columns is not supported.

  • JSON, XML, or other semi-structured data is not supported

Limitations and considerations¶

  • Classification profiles cannot be set on a reader account.

  • A classification profile cannot be set on more than 1,000 databases.

  • A classification profile cannot be directly set on more than 10,000 schemas.

  • A maximum of 100 million tables can be classified in a schema.

  • You cannot automatically classify a table if it has any of the following characteristics:

    • More than 10,000 columns.

    • A column with a name that has more than 255 characters.

    • A column with a name that includes the $ character.