Use SQL to set up sensitive data classification¶
The following sections describe how to use SQL to set up the automatic classification of sensitive data within a database. If you want to use a web interface to set up sensitive data classification, see Use the Trust Center to set up sensitive data classification.
The basic workflow for using SQL to classify sensitive data consists of the following steps:
Create a classification profile that controls what happens during sensitive data classification.
Set the classification profile on a database or schema to automatically classify tables in the entity.
For end-to-end examples of this workflow, see Examples.
About classification profiles¶
A classification profile defines the criteria that are used to automatically classify tables in a database. This criteria includes:
How long a table should exist before automatically classifying it.
How long before previously classified tables should be reclassified.
Whether system and custom tags are automatically set on columns after the classification. You can decide whether you want Snowflake to automatically apply suggested tags or prefer to review proposed tag assignments, then apply them yourself.
A mapping between system classification tags and user-defined object tags so the user-defined tags can be applied automatically.
Whether custom classifiers are used to classify data.
When a data engineer assigns the classification profile to a database, sensitive data in the tables that belong to the database is automatically classified on the schedule defined by the profile. A data engineer can assign the same classification profile to multiple databases, or create multiple classification profiles to set different classification criteria for different databases.
To use SQL to create a classification profile, run the CREATE CLASSIFICATION_PROFILE command to create an instance of the CLASSIFICATION_PROFILE class.
For an example of using the CREATE CLASSIFICATION_PROFILE command to create a classification profile, see Examples.
About tag mapping¶
You can use the classification profile to map SEMANTIC_CATEGORY system tags to one or more object tags. With this tag mapping, a column with sensitive data can be automatically assigned a user-defined tag based on its classification. The tag map can be added while creating the classification profile or later by calling the <classification_profile_name>!SET_TAG_MAP method.
Regardless of whether you are defining the tag map while creating the classification profile or after, the contents of the map are specified
as a JSON object. This JSON object contains the 'column_tag_map' key, which is an array of objects that specify a user-defined tag,
the string value of that tag, and the semantic categories to which the tag is being mapped.
The following is an example of a tag map:
'tag_map': {
'column_tag_map': [
{
'tag_name':'tag_db.sch.pii',
'tag_value':'Highly Confidential',
'semantic_categories':[
'NAME',
'NATIONAL_IDENTIFIER'
]
},
{
'tag_name': 'tag_db.sch.pii',
'tag_value':'Confidential',
'semantic_categories': [
'EMAIL'
]
}
]
}
Based on this mapping, if you have a column of email addresses and the classification process determines that the column contains these
addresses, the tag_db.sch.pii = 'Confidential' tag is set on the column containing the email addresses.
If your tag map includes multiple JSON objects that map tags, tag values, and category values, the order of the JSON objects determines which tag and value to set on the column if there is a conflict. Specify the JSON objects in the desired assignment order from left to right, or top to bottom if you are formatting JSON.
Tip
Each object in the column_tag_map field has only has one required key: tag_name. If you omit the tag_value and
semantic_categories keys, the user-defined tag gets applied to every column to which the SEMANTIC_CATEGORY system tag is applied,
and the value of the user-defined tag will match the value of the SEMANTIC_CATEGORY tag for a given column.
If there is a conflict with a manually assigned tag and a tag applied by automatic classification, an error occurs. For information about tracking these errors, see Troubleshooting sensitive data classification.
Set a classification profile on a database¶
Implement sensitive data classification by setting a classification profile on a database. After you set the classification profile on the database, all tables and views within that database are automatically monitored by sensitive data classification.
You can also set a classification on a schema. If you set a classification profile on a schema that exists within a database that is also associated with a classification profile, the profile set on the schema overrides the profile set on the database.
To set a classification profile, use an ALTER DATABASE or ALTER SCHEMA command to set
the CLASSIFICATION_PROFILE parameter. For example, to set a classification profile my_profile so all tables and views in the my_db
database are monitored by sensitive data classification, run the following command:
ALTER DATABASE my_db
SET CLASSIFICATION_PROFILE = 'governance_db.classify_sch.my_profile';
Access control¶
Here are the privileges and roles that let you work with classification profiles and enable sensitive data classification.
Task |
Required privileges/roles |
Notes |
|---|---|---|
Create a classification profile |
SNOWFLAKE.CLASSIFICATION_ADMIN database role |
For information about granting this database role to other roles, see Using SNOWFLAKE database roles. |
CREATE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE on schema |
You need this privilege on the schema where you want to create the classification profile instance. |
|
USAGE on database and schema |
You need privileges on the schema where you want to create the classification profile instance. |
|
Set the classification profile on a database/schema |
One of the following:
|
By default, the owner of the database/schema has the EXECUTE AUTO CLASSIFICATION privilege on it. |
Any privilege on schema’s database |
If setting a classification profile on a schema, you need at least one privilege on the database that contains that schema. |
|
Any privilege on database/schema |
You need at least one privilege on the database/schema that contains the table that you want to automatically classify. The EXECUTE AUTO CLASSIFICATION privilege meets this requirement. |
|
One of the following:
|
For information about granting the PRIVACY_USER instance role to other roles, see Instance roles. |
|
APPLY TAG on Account |
||
Call methods on a classification profile instance |
<classification_profile>!PRIVACY_USER instance role |
For information about granting this instance role to other roles, see Instance roles. |
List classification profiles |
<classification_profile>!PRIVACY_USER instance role |
|
Drop classification profiles |
OWNERSHIP on classification profile instance |
For an example of granting these privileges and database roles to the role of a data engineer, see Basic example: Automatically classifying tables in a database.
Examples¶
Basic example: Automatically classifying tables in a database
Example: Testing a classification profile before enabling automatic classification
Basic example: Automatically classifying tables in a database¶
Complete these steps to automatically classify a table in the database:
As an administrator, give the data engineer the roles and privileges they need to automatically classify tables in a database.
USE ROLE ACCOUNTADMIN; GRANT USAGE ON DATABASE mydb TO ROLE data_engineer; GRANT EXECUTE AUTO CLASSIFICATION ON DATABASE mydb TO ROLE data_engineer; GRANT DATABASE ROLE SNOWFLAKE.CLASSIFICATION_ADMIN TO ROLE data_engineer; GRANT CREATE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE ON SCHEMA mydb.sch TO ROLE data_engineer; GRANT APPLY TAG ON ACCOUNT TO ROLE data_engineer;
Switch to the data engineer role:
USE ROLE data_engineer;
Create the classification profile as an instance of the CLASSIFICATION_PROFILE class:
CREATE OR REPLACE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE my_classification_profile( { 'minimum_object_age_for_classification_days': 0, 'maximum_classification_validity_days': 30, 'auto_tag': true, 'classify_views': true });
Call the DESCRIBE method on the instance to confirm its properties:
SELECT my_classification_profile!DESCRIBE();
Set the classification profile instance on the schema, which starts the background process of monitoring tables in the schema and automatically classifying them for sensitive data.
ALTER DATABASE mydb SET CLASSIFICATION_PROFILE = 'mydb.sch.my_classification_profile';
Note
There is a one-hour delay between setting the classification profile on the schema and Snowflake beginning to classify the schema.
After waiting one hour, call the SYSTEM$GET_CLASSIFICATION_RESULT stored procedure to obtain the results of the automatic classification.
CALL SYSTEM$GET_CLASSIFICATION_RESULT('mydb.sch.t1');
Example: Using a tag map and custom classifiers¶
As an administrator, give the data engineer the roles and privileges they need to automatically classify tables in a database and set tags on columns.
Create the classification profile.
CREATE OR REPLACE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE my_classification_profile( { 'minimum_object_age_for_classification_days': 0, 'maximum_classification_validity_days': 30, 'auto_tag': true, 'classify_views': true });
Call the SET_TAG_MAP method on the instance to add a tag map to the classification profile. This allows custom tags to be automatically applied on columns that contain sensitive data.
CALL my_classification_profile!SET_TAG_MAP( {'column_tag_map':[ { 'tag_name':'my_db.sch1.pii', 'tag_value':'sensitive', 'semantic_categories':['NAME'] }]});
Alternatively, you could have added this tag map when you created the classification profile.
Call the SET_CUSTOM_CLASSIFIERS method to add custom classifiers to the classification profile. This allows sensitive data to be automatically classified with user-defined semantic and privacy categories.
CALL my_classification_profile!set_custom_classifiers( { 'medical_codes': medical_codes!list(), 'finance_codes': finance_codes!list() });
Alternatively, you could have added the custom classifiers when you created the classification profile.
Call the DESCRIBE method on the instance to confirm that the tag map and custom classifiers have been added to the classification profile.
SELECT my_classification_profile!DESCRIBE();
Set the classification profile instance on the database.
ALTER DATABASE mydb SET CLASSIFICATION_PROFILE = 'mydb.sch.my_classification_profile';
Attach a masking policy to the
tag_db.sch.piitag to enable tag-based masking.ALTER TAG tag_db.sch.pii SET MASKING POLICY pii_mask;
Example: Testing a classification profile before enabling automatic classification¶
As an administrator, give the data engineer the roles and privileges they need to automatically classify tables in a schema and set tags on columns.
Create the classification profile with a tag map and custom classifiers:
CREATE OR REPLACE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE my_classification_profile( { 'minimum_object_age_for_classification_days':0, 'auto_tag':true, 'tag_map': { 'column_tag_map':[ { 'tag_name':'tag_db.sch.pii', 'tag_value':'highly sensitive', 'semantic_categories':['NAME','NATIONAL_IDENTIFIER'] }, { 'tag_name':'tag_db.sch.pii', 'tag_value':'sensitive', 'semantic_categories':['EMAIL','MEDICAL_CODE'] } ] }, 'classify_views': true 'custom_classifiers': { 'medical_codes': medical_codes!list(), 'finance_codes': finance_codes!list() } } );
Call the SYSTEM$CLASSIFY stored procedure to test the tag mappings on the
table1table before enabling automatic classification.CALL SYSTEM$CLASSIFY( 'db.sch.table1', 'db.sch.my_classification_profile' );
The
tagskey in the output contains the details about whether the tag was set (trueif set,falseotherwise), the name of the tag that was set, and the value of the tag:{ "classification_profile_config": { "classification_profile_name": "db.schema.my_classification_profile" }, "classification_result": { "EMAIL": { "alternates": [], "recommendation": { "confidence": "HIGH", "coverage": 1, "details": [], "privacy_category": "IDENTIFIER", "semantic_category": "EMAIL", "tags": [ { "tag_applied": true, "tag_name": "snowflake.core.semantic_category", "tag_value": "EMAIL" }, { "tag_applied": true, "tag_name": "snowflake.core.privacy_category", "tag_value": "IDENTIFIER" }, { "tag_applied": true, "tag_name": "tag_db.sch.pii", "tag_value": "sensitive" } ] }, "valid_value_ratio": 1 }, "FIRST_NAME": { "alternates": [], "recommendation": { "confidence": "HIGH", "coverage": 1, "details": [], "privacy_category": "IDENTIFIER", "semantic_category": "NAME", "tags": [ { "tag_applied": true, "tag_name": "snowflake.core.semantic_category", "tag_value": "NAME" }, { "tag_applied": true, "tag_name": "snowflake.core.privacy_category", "tag_value": "IDENTIFIER" }, { "tag_applied": true, "tag_name": "tag_db.sch.pii", "tag_value": "highly sensitive" } ] }, "valid_value_ratio": 1 } } }
Having verified that automatic classification based on the classification profile will have the desired result, set the classification profile instance on the database.
ALTER DATABASE mydb SET CLASSIFICATION_PROFILE = 'mydb.sch.my_classification_profile';