Tutorial: Automatically classify and tag sensitive data¶
Introduction¶
Identifying and tracking your sensitive data is simple and straightforward. Snowflake provides a built-in algorithm to identify your sensitive data and automatically tag that data with system tags to help track the type of data and how sensitive it is.
With minimal setup, you can also configure a database so Snowflake automatically performs this classification process for new and changing data and applies user-defined tags along with the system tags.
In this tutorial, you’ll do the following:
- Set up the resources you need to complete the tutorial, including a user-defined tag that is applied to the sensitive data.
- Create a classification profile, which Snowflake uses to automatically classify data when it’s added to a database.
- Add a tag map to the classification profile so the user-defined tag is applied to data that Snowflake identifies as sensitive.
- View the results of the classification.
Set up governance database¶
In this tutorial, you’ll create the Snowflake objects (a user-defined tag and a classification profile) needed to govern your data. Based on best practice, these object are created in a database dedicated to governance.
Open a SQL worksheet, and then execute the following statements to create a database and schema for the governance objects:
Note
For simplicity, you will use the ACCOUNTADMIN system role to avoid setting up the privileges needed to configure sensitive data classification. In practice, you should not use this powerful role but rather create custom roles with the required privileges.
Set up your data¶
Before setting up the data for this tutorial, create a warehouse to populate a table:
Create a table¶
-
Create the database and schema that will contain the table to be classified.
-
Create the table structure that will contain the sensitive data.
Insert values into the table¶
Add data to the table you created:
Create a classification profile¶
Great, you now have a table full of data that you need to classify to help protect your sensitive data. Because you want Snowflake to automatically classify data when it is added to a database, you’ll need to create a classification profile.
A classification profile controls how often data in a database is classified, along with what happens during that classification process. Every classification profile is an instance of the CLASSIFICATION_PROFILE class.
To create the classification profile for your database, run the following command:
When this classification profile is set on your database, the following actions happens:
- Classification starts in less than one day (
'minimum_object_age_for_classification_days': 0). - After the initial classification, Snowflake rechecks every 30 days to see if tables need to be reclassified
(
'maximum_classification_validity_days': 30). - Classification tags will be automatically set on columns identified as containing sensitive data (
'auto_tag': true). - Snowflake classifies data in tables and views (
'classify_views': true).
Add tag map to classification profile¶
Because you set 'auto_tag': true in your classification profile, Snowflake will automatically apply system classification tags when it classifies data as being sensitive. The SEMANTIC_CATEGORY tag classifies the type of
data, for example identifying the data as a name or address. The PRIVACY_CATEGORY tag classifies the sensitivity of the data, for
example identifying the data as an identifier or quasi-identifier.
Now suppose you want to go one step further and automatically apply your own user-defined tag based on how data is classified. This tutorial shows you how!
To create the custom tag that you want applied to sensitive data, execute the following statement:
Next, you’ll modify the classification profile so this user-defined tag gets applied when Snowflake identifies that a column contains names. Adding a tag map to the classification profile configures how and when the user-defined tag gets applied.
To add the tag map to your classification profile, execute the classification_profile_name!SET_TAG_MAP method:
Now, if sensitive data classification determines the system-defined semantic category is NAME, then the user-defined tag tutorial_pii is
set on the column. Based on the classification profile, the value of the user-defined tutorial_pii tag is set to sensitive_name.
Note
You can also define a tag map when creating the classification profile.
Set classification profile on a database¶
You have your classification profile configured, so you’re ready to set it on the database. This starts the automatic classification process.
That’s it, Snowflake does the rest! Snowflake starts classifying the existing data and classifies new data when it is added to the database.
View classification results¶
Before completing this part of the tutorial, you’ll have to wait one hour for Snowflake to complete the classification process.
After one hour, execute the following statement to retrieve the results of the classification:
In the results, notice the following:
- The ACCOUNT_NUMBER column was not classified as sensitive, so it wasn’t assigned classification tags.
- The EMAIL column was flagged as having a semantic category of EMAIL and a privacy category of IDENTIFIER.
- Based on the tag map of the classification profile, the
governance_db.sch.tutorial_piiuser-defined tag got assigned to columns that had a semantic category of NAME (see highlighted lines in output).
Clean up, summary, and additional resources¶
Congratulations! You’ve successfully completed this tutorial.
In summary, you learned how to do the following:
- Create a classification profile to control how automatic classification is implemented.
- Add a tag map to the classification profile so user-defined tags are automatically set on columns containing sensitive data.
- Set the classification profile on a database to kick off automatic classification.
- View the results of automatic classification.
Drop the tutorial objects¶
If you plan to repeat the tutorial, you can keep the objects that you created.
Otherwise, drop the tutorial objects as follows:
What’s next?¶
For complete details about implementing automatic sensitive data classification, including associated costs and implementing custom classification, see Use SQL to set up sensitive data classification.