Using custom classifiers to implement custom semantic categories¶
The CUSTOM_CLASSIFIER class allows data engineers to extend their sensitive data classification capabilities based on their own knowledge of their data. To classify sensitive data into custom semantic categories, create an instance of the CUSTOM_CLASSIFIER class in a schema and call instance methods to add regular expressions associated with the instance.
For an end-to-end example of using a CUSTOM_CLASSIFIER instance to create a custom semantic category, see Example.
Commands and methods¶
The following methods and SQL commands are supported:
Access control¶
These sections summarize the roles and grants on various objects that you need to use an instance.
Roles¶
You can use the following roles with custom classification:
SNOWFLAKE.CLASSIFICATION_ADMIN: database role that enables you to create a custom classifier instance.
custom_classifier!PRIVACY_USER: instance role that enables you to call the following methods on the instance:ADD_REGEX
LIST
DELETE_CATEGORY
The account role with the OWNERSHIP privilege on the instance can run these commands:
DROP CUSTOM_CLASSIFIER
SHOW CUSTOM_CLASSIFIER
Grants¶
To create and manage instances, you can choose to either grant the CREATE SNOWFLAKE.DATA_PRIVACY.CUSTOM_CLASSIFIER privilege to a role or grant the PRIVACY_USER instance role to a role.
You can grant the instance roles to account roles and database roles to enable other users to work with custom classifier instances:
Where:
nameSpecifies the name of the custom classifier instance.
role_nameSpecifies the name of an account role.
database_role_nameSpecifies the name of a database role.
You must use a warehouse to call methods on the instance.
To grant the custom role my_classification_role the required instance role and privileges to create and use an instance of the
CUSTOM_CLASSIFIER class, execute the following statements:
If you would like to enable a specific role, such as data_analyst to use a specific instance, do the following:
Example¶
The high-level approach to classify data with custom classifiers is as follows:
Identify a table to classify.
Use SQL to do the following:
Create a custom classifier instance.
Add the custom semantic category and regular expressions to the instance.
Classify the table.
Complete these steps to create a custom classifier to classify a table:
Consider a table,
data.tables.patient_diagnosis, in which one of its columns contains diagnostic codes, such as ICD-10 codes.This table might also include columns to identify patients, such as first and last name, unique health insurance identifiers, and date of birth, that were treated at a medical facility. The data owner can classify the table to ensure that the columns are tagged correctly so the table can be monitored.
In this example, the data owner already has these privileges granted to their role:
OWNERSHIP on the table to classify.
OWNERSHIP on the schema that contains the table.
USAGE on the database that contains the schema and table.
Enable the data owner to classify the table by granting the SNOWFLAKE.CLASSIFICATION_ADMIN database role to the data owner role:
As the data owner, create a schema to store your custom classifier instances:
Use the CREATE CUSTOM_CLASSIFIER command to create a custom classifier instance in the
data.classifiersschema:You can optionally update your search path as follows:
Add
SNOWFLAKE.DATA_PRIVACYso that you don’t have to specify the fully qualified name of the class when creating a new instance of the class.Add
DATA.CLASSIFIERSso that you don’t have to specify the fully qualified name of the instance when calling a method on the instance or using a command with the instance.
Use a SHOW CUSTOM_CLASSIFIER command to list each instance that you create. For example:
Returns:
Call the custom_classifier!ADD_REGEX method on the instance to specify the system tags and regular expression to identify ICD-10 codes in a column. The regular expression in this example matches all possible ICD-10 codes. The regular expression to match the column name,
ICD.*, and the comment are optional:Returns:
Tip
Test the regular expression before adding a regular expression to the custom classifier instance. For example:
In this query, only valid values that match the regular expression are returned. The query does not return invalid values such as
xyz.For details, see String functions (regular expressions).
Call the custom_classifier!LIST method on the instance to verify the regular expression that you added to the instance:
Returns:
To remove a category, call the custom_classifier!DELETE_CATEGORY method on the instance.
Call the SYSTEM$CLASSIFY_SCHEMA stored procedure to classify the table.
If the instance is no longer needed, use the DROP CUSTOM_CLASSIFIER command to remove a custom classifier instance from the system:
Auditing custom classifiers¶
You can use the following queries to audit the creation of custom classifier instances, adding regular expressions to instances, and dropping the instance.
To audit the creation of custom classifier instances, use the following query:
To audit adding regular expressions to a specific instance, use the following query and replace
DB.SCH.MY_INSTANCEwith the name of the instance that you want to audit:To audit dropping a custom classifier instance, use the following query: