Use Data Classification with the classic APIs¶
This topic provides examples of classifying data with the classic APIs in Snowflake: EXTRACT_SEMANTIC_CATEGORIES and ASSOCIATE_SEMANTIC_CATEGORY_TAGS. These APIs are no longer being updated to coincide with updates to Data Classification.
You can continue to use these APIs. However, you should update your workflows to use the approaches as shown in Use Data Classification.
Overview¶
The following examples for classifying data address different use cases based on scope:
If you are a new Classification user, start with classifying a single table and then proceed to the other two examples.
Common table and roles¶
The examples in this topic use the table and custom roles shown below:
Table:
my_db.my_schema.hr_data
, which contains data about employees.Roles:
data_engineer
: classifies tables.policy_admin
: protects personal information with a masking policy.analyst
: serves as an example of a custom role for which you might want to restrict access. The examples assume this role exists.-
GOVERNANCE_ADMIN: assigns the Classification system tags.
GOVERNANCE_VIEWER: queries the Account Usage views relating to tags and masking policies.
Create the table:
CREATE OR REPLACE TABLE hr_data ( age INTEGER, email_address VARCHAR, fname VARCHAR, lname VARCHAR );
Load the table (details not shown).
Create the custom roles and grant the necessary privileges to those roles.
The
data_engineer
role needs access to the table to run the Classification process. Note that this example grants the GOVERNANCE_ADMIN database role to thedata_engineer
role because you can not switch roles to a database role:use role accountadmin; create role data_engineer; grant usage on database my_db to role data_engineer; grant usage on schema my_db.my_schema to role data_engineer; grant select, update on table my_db.my_schema.hr_data to role data_engineer; grant database role snowflake.governance_admin to role data_engineer;
The
policy_admin
custom role needs to assign a masking policy to any column containing PII data:use role accountadmin; create role policy_admin; grant apply masking policy on account to role policy_admin; grant database role snowflake.governance_viewer to role policy_admin;
Classify a single table¶
The single table example expands on the three-step Classification process (i.e. analyze, review, and apply) to apply the results to
revoke table access from the analyst
custom role.
Step 1: Classify the table columns¶
In this step, the data_engineer
runs the Classification process and the policy_admin
protects the column data with a
masking policy.
Analyze: Use the
data_engineer
custom role to call the EXTRACT_SEMANTIC_CATEGORIES function to classify the columns in the table namedmy_db.my_schema.hr_data
:USE ROLE data_engineer; SELECT EXTRACT_SEMANTIC_CATEGORIES('my_db.my_schema.hr_data');
Review: The data engineer reviews the results to ensure they make sense.
Apply: The data engineer assigns a Classification system tag to the columns.
The data engineer has two options: automated assignment or manual assignment.
To assign the system tags automatically, call the ASSOCIATE_SEMANTIC_CATEGORY_TAGS stored procedure. Note:
The fully-qualified name of the table and the function from the first step are arguments for the stored procedure.
The stored procedure reruns the EXTRACT_SEMANTIC_CATEGORIES function. If you want to preserve the results from the first step, save the results to a table prior to calling the stored procedure.
CALL ASSOCIATE_SEMANTIC_CATEGORY_TAGS( 'my_db.my_schema.hr_data', EXTRACT_SEMANTIC_CATEGORIES('my_db.my_schema.hr_data') );
If the stored procedure executes successfully, it returns a message similar to the following:
Applied tag semantic_category to <n> columns. Applied tag privacy_category to <n> columns.
Otherwise, when the stored procedure doesn’t execute or if the decision is to assign the Classification system tag to each column as needed manually, use an ALTER TABLE … ALTER COLUMN statement. For example, assign either system tag to the FNAME column (i.e. first name).
USE ROLE data_engineer; ALTER TABLE my_db.my_schema.hr_data MODIFY COLUMN fname SET TAG SNOWFLAKE.CORE.SEMANTIC_CATEGORY='NAME';
or
ALTER TABLE my_db.my_schema.hr_data MODIFY COLUMN fname SET TAG SNOWFLAKE.CORE.PRIVACY_CATEGORY='IDENTIFIER';
Step 2: Protect the table columns¶
This step assumes that several columns in the my_db.my_schema.hr_data
table have the PRIVACY_CATEGORY = 'IDENTIFIER'
tag
assigned to these columns and that there is a need to protect these columns with a masking policy.
To protect these columns:
Use the
policy_admin
role to find the columns that have the IDENTIFIER privacy tag applied:USE ROLE policy_admin; SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.TAG_REFERENCES WHERE TAG_NAME = 'PRIVACY_CATEGORY' AND TAG_VALUE = 'IDENTIFIER';
The latency for the TAG_REFERENCES view can be up to 120 minutes. If you need results sooner and know the name of the column for which you are querying the classification tags, you can use the TAG_REFERENCES or TAG_REFERENCES_ALL_COLUMNS table functions instead.
Use the
policy_admin
role to apply masking policies to the appropriate columns. For example, the following statement applies theidentifier_mask
masking policy to thefname
column:ALTER TABLE my_db.my_schema.hr_data MODIFY COLUMN fname SET MASKING POLICY governance.policies.identifier_mask;
Classify all tables in a schema¶
This example shows how to classify all the tables in a schema using two user-defined stored procedures:
classify_schema
: Lists all the tables in the schema, creates a table to store the classification results, and then extracts the classification tags from each table and stores them in the results table.associate_tag_batch
: Uses the results of theclassify_schema
stored procedure to assign the Classification system tags to all table columns in the schema automatically and returns the number of tags assigned to each table.
Important
The stored procedure named classify_schema
creates a temporary table to store results. The temporary table exists for the
duration of the user session for the user that calls this stored procedure. When the user session expires, Snowflake drops the temporary
table and the user needs to call the stored procedure again to recreate the temporary table.
If the temporary table needs to be preserved, remove the temp
keyword from the sqlText
command to create the table.
For more details, see the TEMP[ORARY]
option in the CREATE TABLE command.
Create the first procedure named
classify_schema
:create or replace procedure classify_schema(schema_name string, result_table string) returns object language JavaScript as $$ // 1 const table_names = snowflake.execute({ sqlText: `show terse tables in schema identifier(?)`, binds: [SCHEMA_NAME], }); // helper function for quoted table names function quote(name) { return '"'+ name.replace(/"/g,'""') + '"'; } // create table to store results in. if it already exists, we will add to it rather than overwrite snowflake.execute({ sqlText: `create temp table if not exists identifier(?) (table_name string, result variant)`, binds: [RESULT_TABLE], }) // loop through tables while (table_names.next()) { let name = table_names.getColumnValue('name'); // add schema to table name name = SCHEMA_NAME + "." + quote(name); // insert qualified table name and result into result table const results = snowflake.execute({ sqlText: `insert into identifier(?) select ?, extract_semantic_categories(?)`, binds: [RESULT_TABLE, name, name], }); } // return the number of tables classified return {tables_classified: table_names.getRowCount()}; $$;
Create the second procedure named
associate_tag_batch
:create or replace procedure associate_tag_batch(result_table string) returns Object language JavaScript as $$ // get table names and classification results to loop through const tags_to_apply = snowflake.execute({ sqlText: `select table_name, result from identifier(?)`, binds: [RESULT_TABLE], }); const out = {}; while (tags_to_apply.next()) { // get table name const name = tags_to_apply.getColumnValue('TABLE_NAME'); // get classification result const classification_results = tags_to_apply.getColumnValue('RESULT'); // call associate semantic category tags with table name and classification result const results = snowflake.execute({ sqlText: `call associate_semantic_category_tags(?, parse_json(?))`, binds: [name, JSON.stringify(classification_results)], }); results.next(); out[name] = results.getColumnValue(1).split('\n'); } // return number of tags applied per table return out; $$;
Call the
classify_schema
stored procedure, using the name of the schema you would like to classify and the name of a temporary table to hold the results of EXTRACT_SEMANTIC_CATEGORY for each table as the arguments:call classify_schema('my_db.my_schema','my_temporary_classification_table');
Review the results in the temporary table and modify as necessary.
When satisfied with the results, call the
associate_tag_batch
stored procedure to assign the Classification system tags to the table columns:call associate_tag_batch('my_temporary_classification_table');
Classify all tables in a database¶
This example shows how to classify all the tables in a database by using two stored procedures:
classify_database
: Classifies all tables within a schema for each schema in the database, and returns the number of tables and the number of schemas that are classified.associate_tag_batch
: Performs the same operations as defined in Classify all tables in a schema (in this topic).
Create the
classify_database
stored procedure:create or replace procedure classify_database(database_name string, result_table string) returns Object language JavaScript as $$ // get list of schemas in database const schema_names = snowflake.execute({ sqlText: `show terse schemas in database identifier(?)`, binds: [DATABASE_NAME], }); // helper function for quoted schema names function quote(name) { return '"'+ name.replace(/"/g,'""') + '"'; } // counter for tables. will use result from classify_schema to increment let table_count = 0 while (schema_names.next()) { let name = schema_names.getColumnValue('name'); // skip the information schema if (name == "INFORMATION_SCHEMA") { continue; } // add database name to schema name = DATABASE_NAME + "." + quote(name); // call classify_schema on each schema. This will loop over tables in schema const results = snowflake.execute({ sqlText: `call classify_schema(?, ?)`, binds: [name, RESULT_TABLE], }); results.next(); // increment total number of tables by the number of tables in the schema table_count += results.getColumnValue(1).tables_classified ; } return { tables_classified: table_count, // subtract one from number of schemas because we skip the information schema schemas_classified: schema_names.getRowCount() - 1, }; $$;
Call the
classify_database
stored procedure with the name of the database you wish to classify and the name of the temporary table to store the results in each schema of the database:call classify_database('my_db','my_temporary_classification_table');
Navigate to each schema and review the temporary table and revise if necessary.
When satisfied with the results, call the
associate_tag_batch
stored procedure once for each schema to apply the tags to the tables in that schema. For example, if the database contains three schemas, call the stored procedure three times:call associate_tag_batch('my_temporary_classification_table');