SYSTEM$CLASSIFY

Classifies the specified object with the option to specify the number of rows to sample and assign the recommended Data Classification system tag to each column in the specified object.

Syntax

CALL SYSTEM$CLASSIFY( '<object_name>' , <arg> )
Copy

Arguments

object_name

The name of the table, external table, view, or materialized view containing the columns to be classified. If a database and schema are not in use in the current session, the name must be fully-qualified.

The name must be specified exactly as it is stored in the database. If the name contains special characters, capitalization, or blank spaces, the name must be enclosed first in double-quotes and then in single quotes.

arg

Specifies an argument to determine how the classification process works. One of the following:

NULL

Snowflake uses its default configuration based on the number of rows in in the specified object. System tags are not set on any columns in the specified object.

{}

An empty list, which is functionally equivalent to specifying NULL.

{'sample_count': integer}

Specifies the number of rows to sample in the specified object. Any number from 1 to 10000, inclusive.

{'auto_tag': true}

Sets the recommended classification system tags on the columns in the specified object when the classification process is complete.

When you use this argument, call the stored procedure with the role that has the OWNERSHIP privilege on the schema.

{'sample_count': integer, 'auto_tag': true}

Classify the specified object while specifying the number of rows to sample and set the recommended system tag on each column in the specified object when the classification process is complete.

When you use this argument, call the stored procedure with the role that has the OWNERSHIP privilege on the schema.

{'use_all_custom_classifiers': true}

Snowflake evaluates all custom classification instances and recommends the tag associated with a custom classification instance based on the classification result.

This option uses the custom classifiers that are accessible to the role in use that calls the stored procedure (current role, caller’s rights). For details, see Understanding Caller’s Rights and Owner’s Rights Stored Procedures.

{'custom_classifiers': ['instance_name1' [ , 'instance_name2' ... ] ]}

Specifies the custom classification instance to evaluate as a source for the recommended tag to be set on the column.

You can specify multiple instances in the list and separate each instance with a comma.

Returns

Returns a JSON object in the following format. For example:

{
  "classification_result": {
    "col1_name": {
      "alternates": [],
      "recommendation": {
        "confidence": "HIGH",
        "coverage": 1,
        "details": [
          {
            "coverage": 1,
            "semantic_category": "US_PASSPORT"
          }
        ],
        "privacy_category": "IDENTIFIER",
        "semantic_category": "PASSPORT"
      },
      "valid_value_ratio": 1
    },  
    "col2_name": { ... },
    ...
  }
}
Copy

Where:

alternates

Specifies information about each tag and value to consider other than the recommended tag.

recommendation

Specifies information about each tag and value as the primary choice based on the classification process.

These values can appear in both the alternates and recommendation:

classifier_name

The fully-qualified name of the custom classification instance that was used to tag the classified column.

This field only appears when using a custom classification instance as the source of the tag to set on a column.

confidence

Specifies one of the following values: HIGH, MEDIUM, or LOW. This value indicates the relative confidence that Snowflake has based upon the column sampling process and how the column data aligns with how Snowflake classifies data.

coverage

Specifies the percent of sampled cell values that match the rules for a particular category.

details

Specifies fields and values that refer to a geographical tag value for the SEMANTIC_CATEGORY tag.

privacy_category

Specifies the privacy category tag value.

The possible values are IDENTIFIER, QUASI-IDENTIFIER and SENSITIVE.

semantic_category

Specifies the semantic category tag value.

For possible tag values, see System tags and categories and System tags and categories.

valid_value_ratio

Specifies the ratio of valid values in the sample size. Invalid values include NULL, an empty string, and a string with more than 256 characters.

Usage notes

Examples

Classify a table:

CALL SYSTEM$CLASSIFY('hr.tables.empl_info', null);
Copy

Classify a table and specify the number of rows to sample:

CALL SYSTEM$CLASSIFY('hr.tables.empl_info', {'sample_count': 1000});
Copy

Classify a table and set the system tags to the columns:

CALL SYSTEM$CLASSIFY('hr.tables.empl_info', {'auto_tag': true});
Copy

Classify a table, and specify the number of rows to sample and set the recommended system tag to each column in the table:

CALL SYSTEM$CLASSIFY('hr.tables.empl_info', {'sample_count': 1000, 'auto_tag': true});
Copy