Use Data Classification

This topic provides information on how to classify tables in a schema, review the classification results, and set system tags on the columns in the tables by using SQL or Snowsight.

Overview

A data steward can classify tables in a schema to meet data privacy regulations. These tables include the following scenarios:

  • New tables in a schema.

  • New columns or changed columns in a table within the same schema.

  • Previously classified tables that might need to be classified again.

Evaluating these scenarios enables the data steward to classify sensitive and personal information. Setting system tags on the column can facilitate data monitoring. Subsequently, a data engineer can protect the personal and sensitive data with a masking policy or row access policy in Snowsight or using SQL.

Get started classifying data

Before you classify a single table or tables in a schema:

  • Choose a workflow:

  • Decide on the warehouse to use. For example, use a warehouse that matches the cost center or business unit to which you belong to enable accurate reporting and budgeting.

  • Factor latency into your workflow. Snowsight updates the Databases area of Snowsight every 12 hours, at which point the updated tables can be classified.

  • Confirm the access control configuration for your current role:

    • Use a SHOW GRANTS TO ROLE statement to see the grants on objects a user with that role can access. The role in use must have these grants as a minimum:

      • USAGE on the warehouse to use during the classification process.

      • SELECT on the table to start the classification process.

      • The SNOWFLAKE.CORE_VIEWER database role, which allows setting the system tags on columns.

      • The SNOWFLAKE.GOVERNANCE_VIEWER database role to query the Account Usage DATA_CLASSIFICATION_LATEST view.

    • For other combinations of grants, see the Data Classification privilege reference.

    • The access control model allows for different personas to participate in the workflows. For example, a data steward can stage the classification of tables in a schema, and the data engineer can evaluate the results in Snowsight. Choose the access control model and workflow that works best for you.

Using SQL to classify a single table

This approach uses SQL to classify a table and set system tags on columns in the tables. Review the Get started classifying data section to ensure that the role in use has the necessary access control grants to classify the tables in the schema.

This procedure assumes that you use a role with the appropriate grants. Complete the following steps to use SQL to classify tables in a schema and set system tags on the table columns:

  1. Identify a table to classify.

  2. Call the SYSTEM$CLASSIFY stored procedure to classify and tag the columns in the table:

    CALL SYSTEM$CLASSIFY('hr.tables.empl_info', {'auto_tag': true});
    
    Copy
  3. Call the TAG_REFERENCES_ALL_COLUMNS Information Schema table function to confirm the tag assignments on the columns in the table:

    SELECT *
    FROM TABLE(
      hr.INFORMATION_SCHEMA.TAG_REFERENCES_ALL_COLUMNS(
        'hr.tables.empl_info',
        'table'
    ));
    
    Copy
  4. Afterward, you can query the Account Usage DATA_CLASSIFICATION_LATEST view to see the latest classification result for the table.

Repeat these steps for each table that you want to classify.

Using SQL to classify tables in a schema

This approach uses SQL to classify all tables in a schema and set system tags on columns in each table. This procedure assumes that you:

  • This procedure assumes that you use a role with the appropriate grants.

  • Select a schema that contains less than 1000 table objects. You can use a SHOW TABLES IN SCHEMA command to determine the number of table objects in the schema.

Complete the following steps to use SQL to classify all tables in a schema and set system tags on each column in each table:

  1. Identify a schema that contains tables to classify.

  2. Call the SYSTEM$CLASSIFY_SCHEMA stored procedure to schedule the classification of tables in the schema:

    CALL SYSTEM$CLASSIFY_SCHEMA('hr.tables', {'auto_tag': true});
    
    Copy
  3. If the output contains a failed message for one or more tables, use a SNOW GRANTS ON TABLE command to determine the grants on the specified table. For example:

    SHOW GRANTS ON TABLE hr.tables.salary;
    
    Copy
  4. If necessary, grant privileges on the unclassified table to the role that calls the SYSTEM$CLASSIFY_SCHEMA stored procedure. Otherwise, call the SYSTEM$GET_CLASSIFICATION_RESULT function on each table in the schema:

    SELECT SYSTEM$GET_CLASSIFICATION_RESULT('hr.tables.empl_info');
    
    Copy
  5. Call the TAG_REFERENCES_ALL_COLUMNS Information Schema table function to confirm the tag assignments on the columns in the table:

    SELECT *
    FROM TABLE(
      hr.INFORMATION_SCHEMA.TAG_REFERENCES_ALL_COLUMNS(
        'hr.tables.empl_info',
        'table'
    ));
    
    Copy
  6. Afterward, you can query the Account Usage DATA_CLASSIFICATION_LATEST view to see the latest classification result for the table.

Repeat these steps for each schema that contains tables that you want to classify.

Using Snowsight to classify tables in a schema

This approach uses Snowsight to stage the classification of tables in a schema and to auto-tag columns for each table in the schema. The procedure assumes that you are use a role with the appropriate grants.

To classify all tables in a schema and set system tags on each column in each table contained in the schema, complete the following steps in Snowsight:

  1. Trigger the classification and tagging of tables in the schema:

    1. In Snowsight, use the object explorer to navigate to the schema that you selected.

    2. Select the More menu (), and then select Classify Data.

    3. Select a warehouse and turn on the Auto-tagging data option.

    4. Select Classify Data and Apply Tags.

      You can select this option to reclassify a table that was classified previously.

    Snowsight classifies up to 1000 tables at a time. If your schema contains more than 1000 tables, complete this procedure for the initial batch of tables first and then repeat the procedure for the remaining tables.

    Alternatively, you can open a Worksheet and call the SYSTEM$CLASSIFY_SCHEMA as shown in Using SQL to classify tables in a schema. If you choose this option, navigate back to the schema in the object explorer after calling this stored procedure.

  2. Allow the classification process to complete. When complete, a green checkmark appears in the CLASSIFICATION column.

  3. Select View Results.

    If you decide to turn off the Auto-tagging data option, select Review Classification and evaluate the results and determine the tag value to set on each column.

    To view the tags set on the columns in the Column details tab, use a role that is granted IMPORTED PRIVILEGES on the SNOWFLAKE database.

  4. Evaluate the VALUE TO BE APPLIED column, change the tag value as needed, and select Complete classification.

  5. The Tables tab displays Classification results reviewed and applied. You can confirm the tag assignments by doing the following:

    1. Select the table.

    2. Navigate to the Columns tab.

    3. Review the TAGS column.

    Alternatively, you can do the following:

    1. Use a worksheet to call the TAG_REFERENCES_ALL_COLUMNS Information Schema table function to view the tag assignments for a columns in particular table. For other options, see Tracking system tags.

    2. Review the Account Usage DATA_CLASSIFICATION_LATEST view records by either navigating to the view using the object explorer or querying the view in a worksheet.

  6. Repeat these steps for other schemas that contain tables to classify.