Classify sensitive data manually

This topic provides information on how to classify tables in a schema, review the classification results, and set system tags on the columns in the tables by using SQL or Snowsight.

Overview

A data steward can classify tables in a schema to meet data privacy regulations. These tables include the following scenarios:

  • New tables in a schema.

  • New columns or changed columns in a table within the same schema.

  • Previously classified tables that might need to be classified again.

Evaluating these scenarios enables the data steward to classify sensitive and personal information. Setting system tags on the column can facilitate data monitoring. Subsequently, a data engineer can protect the personal and sensitive data with a masking policy or row access policy in Snowsight or using SQL.

Get started classifying data

Before you classify a single table or tables in a schema:

  • Choose a workflow:

  • Decide on the warehouse to use. For example, use a warehouse that matches the cost center or business unit to which you belong to enable accurate reporting and budgeting.

  • Factor latency into your workflow. Snowsight updates the Databases area of Snowsight every 12 hours, at which point the updated tables can be classified.

  • Confirm the access control configuration for your current role:

    • Use a SHOW GRANTS TO ROLE statement to see the grants on objects a user with that role can access. The role in use must have these grants as a minimum:

      • USAGE on the warehouse to use during the classification process.

      • SELECT on the table to start the classification process.

      • The SNOWFLAKE.CORE_VIEWER database role, which allows setting the system tags on columns.

      • The SNOWFLAKE.GOVERNANCE_VIEWER database role to query the Account Usage DATA_CLASSIFICATION_LATEST view.

    • For other combinations of grants, see the Data Classification privilege reference.

    • The access control model allows for different personas to participate in the workflows. For example, a data steward can stage the classification of tables in a schema, and the data engineer can evaluate the results in Snowsight. Choose the access control model and workflow that works best for you.

Cost of manually classifying sensitive data

The manual classification process requires compute resources, which are provided by the virtual warehouse that is in use and running when classification is performed.

The amount of time needed to classify the data in a table/view (and, therefore, the number of credits consumed by the warehouse) is a function of the columns to be classified.

In particular, if a table or view has a large number of columns that support classification, the processing time can be impacted. However, as a general rule, the processing speed scales linearly with the warehouse size. In other words, each size increase for a warehouse (X-small to Small) typically reduces the processing time by half.

Use the following general guidelines to select a warehouse size:

  • No concern for processing time: x-small warehouse.

  • Up to 100 columns in a table: small warehouse.

  • 101 to 300 columns in a table: medium warehouse.

  • More than 300 in a table: large warehouse.

For information, see Warehouse considerations.

Using SQL to classify a single table

This approach uses SQL to classify a table and set system tags on columns in the tables. Review the Get started classifying data section to ensure that the role in use has the necessary access control grants to classify the tables in the schema.

This procedure assumes that you use a role with the appropriate grants. Complete the following steps to use SQL to classify tables in a schema and set system tags on the table columns:

  1. Identify a table to classify.

  2. Call the SYSTEM$CLASSIFY stored procedure to classify and tag the columns in the table:

    CALL SYSTEM$CLASSIFY('hr.tables.empl_info', {'auto_tag': true});
    
    Copy
  3. Call the TAG_REFERENCES_ALL_COLUMNS Information Schema table function to confirm the tag assignments on the columns in the table:

    SELECT *
    FROM TABLE(
      hr.INFORMATION_SCHEMA.TAG_REFERENCES_ALL_COLUMNS(
        'hr.tables.empl_info',
        'table'
    ));
    
    Copy
  4. Afterward, you can query the Account Usage DATA_CLASSIFICATION_LATEST view to see the latest classification result for the table.

Repeat these steps for each table that you want to classify.

Using SQL to asynchronously classify tables in a schema

This approach uses SQL to asynchronously classify all tables in a schema and set system tags on columns in each table. This procedure assumes that you:

  • This procedure assumes that you use a role with the appropriate grants.

  • Select a schema that contains less than 1000 table objects. You can use a SHOW TABLES IN SCHEMA command to determine the number of table objects in the schema.

Complete the following steps to use SQL to classify all tables in a schema and set system tags on each column in each table:

  1. Identify a schema that contains tables to classify.

  2. Call the SYSTEM$CLASSIFY_SCHEMA stored procedure to schedule the classification of tables in the schema:

    CALL SYSTEM$CLASSIFY_SCHEMA('hr.tables', {'auto_tag': true});
    
    Copy

    Be careful not to classify too many schemas simultaneously. For more information, see the usage notes.

  3. To view the results call the SYSTEM$GET_CLASSIFICATION_RESULT function on each table in the schema:

    SELECT SYSTEM$GET_CLASSIFICATION_RESULT('hr.tables.empl_info');
    
    Copy
  4. Call the TAG_REFERENCES_ALL_COLUMNS Information Schema table function to confirm the tag assignments on the columns in the table:

    SELECT *
    FROM TABLE(
      hr.INFORMATION_SCHEMA.TAG_REFERENCES_ALL_COLUMNS(
        'hr.tables.empl_info',
        'table'
    ));
    
    Copy
  5. Afterward, you can query the Account Usage DATA_CLASSIFICATION_LATEST view to see the latest classification result for the table.

Repeat these steps for each schema that contains tables that you want to classify.

Tip

If you want to cancel the classification of tables in a schema, call the SYSTEM$CANCEL_CLASSIFY_SCHEMA stored procedure.

Using Snowsight to classify tables in a schema

This approach uses Snowsight to stage the classification of tables in a schema and to auto-tag columns for each table in the schema. The procedure assumes that you are use a role with the appropriate grants.

To classify all tables in a schema and set system tags on each column in each table contained in the schema, complete the following steps in Snowsight:

  1. Trigger the classification and tagging of tables in the schema:

    1. In Snowsight, use the object explorer to navigate to the schema that you selected.

    2. Select the More menu (), and then select Classify and Tag Sensitive Data.

    3. Select a warehouse if a warehouse is not already in use.

    4. Select the tables that you want to classify. By default, tables are not selected.

    5. Evaluate the Advanced Options and select them as needed:

      • The Auto-tagging data option enables you to automatically assign tags to columns after classifying data. This option is enabled by default and you can disable it if needed.

      • The Include custom classifiers option enables you to use the custom classifiers you can access to classify data.

        To determine the custom classifiers you can access, select View custom classifiers and run the command in the worksheet.

  2. Select Classify and Tag Sensitive Data.

    You can select this option to reclassify a table that was classified previously.

    Snowsight classifies up to 1000 tables.

    Alternatively, you can open a Worksheet and call the SYSTEM$CLASSIFY_SCHEMA as shown in Using SQL to asynchronously classify tables in a schema. If you choose this option, navigate back to the schema in the object explorer after calling this stored procedure. Be mindful about the number of times you call this stored procedure in a relatively short time period. For more information, see the usage notes.

  3. Allow the classification process to complete. When complete, a green checkmark appears in the CLASSIFICATION column.

  4. Select View Results.

  5. Change the tag value as needed. To do this, use a role that is granted IMPORTED PRIVILEGES on the SNOWFLAKE database.

  6. Follow the prompts to review and approve tables in the schema. Update the tag values as needed.

  7. Select Complete classification.

  8. The Tables tab displays Classification results reviewed and applied. You can confirm the tag assignments by doing the following:

    1. Select the table.

    2. Navigate to the Columns tab.

    3. Review the TAGS column.

    Alternatively, you can do the following:

    1. Use a worksheet to call the TAG_REFERENCES_ALL_COLUMNS Information Schema table function to view the tag assignments for a columns in particular table. For other options, see Tracking system tags.

    2. Review the Account Usage DATA_CLASSIFICATION_LATEST view records by either navigating to the view using the object explorer or querying the view in a worksheet.

  9. Repeat these steps for other schemas that contain tables to classify.