Introduction to Classification

Classification is a process that analyzes and categorizes information stored in the columns in database tables and views.

Once the process completes, classification utilizes object tags to label the data, which can then be used to facilitate analysis of and compliance with privacy regulations.

In this Topic:

What is Classification?

Classification enables answering questions about the data stored in tables and views, such as:

  • Does the table/view contain PII (Personally Identifiable Information) or sensitive data?

  • Where is the data stored and how long has it been stored?

  • How can the data be protected from exposure while still deriving insights?

The classification process samples all the supported columns in a table or view and uses the column names and values to classify the data into system categories provided by Snowflake. The categories can be assigned to the columns as tags, which can be set manually or using the provided stored procedure.

Classification Use Cases

Once the tags produced by classification have been assigned to a table, view, or column, they can be used to enable a variety of data governance, sharing, and privacy use cases, including:

PII Classification

You can use classification to identify PII (Personally Identifiable Information) in your data to mitigate risk and meet compliance.

Data Access

You can use classification tags to configure security controls to prevent unauthorized access to personal data.

Policy Management

You can use classification tags to determine how to set masking policies to protect the privacy of the data.


You can use classification to streamline anonymization of personal data. Anonymization relies on classification privacy categories to protect the identity of the associated subjects while still making their data available for analysis.

Supported Objects and Column Data Types

Snowflake supports classifying data stored in all types of tables and views, including:

  • External tables

  • Materialized views

  • Secure views

You can classify table and view columns for all supported data types except for the following data types:




    Note that you can classify a column with the VARIANT data type when the column data type can be cast to a NUMBER or STRING data type. Snowflake does not classify the column if the column contains JSON, XML, or other semi-structured data.

If a table or view contains columns that are not of a supported data type or the column contains all NULL values, the classification process ignores the columns and does not include them in the output.


If your data represents NULL values with a value other than NULL, the accuracy of the classification results may be impacted.

Compute Costs

The classification process requires compute resources, which are provided by the virtual warehouse that is in use and running when classification is performed.

The amount of time needed to classify the data in a table/view (and, therefore, the number of credits consumed by the warehouse) is a function of the amount of data to be classified.

In particular, if a table/view has a large number of columns that support classification, the processing time can be impacted. However, as a general rule, the processing speed scales linearly with the warehouse size. In other words, each size increase for a warehouse (e.g. X-small to Small) typically reduces the processing time by half.

Use the following general guidelines to select a warehouse size:

  • No concern for processing time: x-small warehouse.

  • Up to 100 columns in a table: small warehouse.

  • 101 to 300 columns in a table: medium warehouse.

  • 301 columns or more in a table: large warehouse.

Classification Categories

Snowflake utilizes two category types for classifying data in table/view columns:

  • Semantic categories

  • Privacy categories

Semantic Categories

A semantic category identifies a column as storing personal attributes. Some of the semantic categories supported by Snowflake include:

  • Name

  • Address

  • Zip code

  • Phone number (currently US numbers only)

  • Age

  • Gender

For a complete list of the semantic categories supported in the current release, see Category Tag Values and Mappings. Additional semantic categories will be added in future releases.

Privacy Categories

If a column is determined to have a semantic category, the column is further classified according to one of the following privacy categories:


Also known as direct identifiers, these attributes uniquely identify an individual (e.g. name, social security number, or phone number).


Also known as indirect identifiers, these attributes, when combined with other attributes, can be used to uniquely identify an individual (e.g. age + gender + zip).


Personal attributes that are not identifying, but are information that individuals do not want disclosed for privacy reasons (e.g. salary or medical/healthcare status).


Multiple semantic categories from all three privacy categories may be considered “Sensitive Personal Data”, “Special Categories of Data”, or similar terms under laws and regulations, and may require additional protections or controls.

Currently, classification does not tag data as both sensitive and identifying. In other words, classification is an “either-or” operation, which you must consider when creating rules to govern access to data identified as sensitive.

Semantic Category Probabilities and Alternates

In addition to identifying the semantic category and privacy category for a column, Snowflake also returns the following information about the semantic category for the column:

  • The probability that the classification process derived the correct semantic category.

  • A list of alternate semantic categories with which the column can be tagged (if the probability is below the 0.80 threshold and the process identified other possible semantic categories with a probability greater than 0.15).

For more details, see the EXTRACT_SEMANTIC_CATEGORIES function.

System Tags

Classification utilizes pre-defined system tags for the semantic and privacy categories:

  • For the SEMANTIC_CATEGORY tag, the possible tag values are the semantic categories (NAME, AGE, etc.). For the complete list of possible semantic category values, see Category Tag Values and Mappings.

  • For the PRIVACY_CATEGORY tag, the possible tag values are the privacy categories (IDENTIFIER, QUASI_IDENTIFIER, or SENSITIVE).

The system tags are stored in the CORE schema in the SNOWFLAKE read-only shared database. To view the tag names, use the SHOW TAGS command.

For example:



To view the values assigned to the system tags after the tags have been extracted, see Viewing and Tracking Classification Data.

Back to top