SYSTEM$CLASSIFY¶
Classifies the specified object with the option to specify the number of rows to sample and assign the recommended Data Classification system tag to each column in the specified object.
Syntax¶
SYSTEM$CLASSIFY( '<object_name>' , <object> )
Arguments¶
object_name
The name of the table, external table, view, or materialized view containing the columns to be classified. If a database and schema are not in use in the current session, the name must be fully-qualified.
The name must be specified exactly as it is stored in the database. If the name contains special characters, capitalization, or blank spaces, the name must be enclosed first in double-quotes and then in single quotes.
object
Specifies a JSON OBJECT to determine how the classification process works. One of the following:
NULL
Snowflake uses its default configuration based on the number of rows in in the specified object. System tags are not set on any columns in the specified object.
{}
An empty object, which is functionally equivalent to specifying
NULL
.{'sample_count': integer}
Specifies the number of rows to sample in the specified object. Any number from
1
to10000
, inclusive.{'auto_tag': true}
Sets the recommended classification system tags on the columns in the specified object when the classification process is complete.
When you use this argument, call the stored procedure with the role that has the OWNERSHIP privilege on the schema.
{'sample_count': integer, 'auto_tag': true}
Classify the specified object while specifying the number of rows to sample and set the recommended system tag on each column in the specified object when the classification process is complete.
When you use this argument, call the stored procedure with the role that has the OWNERSHIP privilege on the schema.
{'use_all_custom_classifiers': true}
Snowflake evaluates all custom classification instances and recommends the tag associated with a custom classification instance based on the classification result.
This option uses the custom classifiers that are accessible to the role in use that calls the stored procedure (current role, caller’s rights). For details, see Understanding caller’s rights and owner’s rights stored procedures.
{'custom_classifiers': ['instance_name1' [ , 'instance_name2' ... ] ]}
Specifies the custom classification instance to evaluate as a source for the recommended tag to be set on the column.
You can specify multiple instances in the list and separate each instance with a comma.
Returns¶
Returns a JSON object in the following format. For example:
{
"classification_result": {
"col1_name": {
"alternates": [],
"recommendation": {
"confidence": "HIGH",
"coverage": 1,
"details": [
{
"coverage": 1,
"semantic_category": "US_PASSPORT"
}
],
"privacy_category": "IDENTIFIER",
"semantic_category": "PASSPORT"
},
"valid_value_ratio": 1
},
"col2_name": { ... },
...
}
}
Where:
alternates
Specifies information about each tag and value to consider other than the recommended tag.
recommendation
Specifies information about each tag and value as the primary choice based on the classification process.
These values can appear in both the alternates and recommendation:
classifier_name
The fully-qualified name of the custom classification instance that was used to tag the classified column.
This field only appears when using a custom classification instance as the source of the tag to set on a column.
confidence
Specifies one of the following values:
HIGH
,MEDIUM
, orLOW
. This value indicates the relative confidence that Snowflake has based upon the column sampling process and how the column data aligns with how Snowflake classifies data.coverage
Specifies the percent of sampled cell values that match the rules for a particular category.
details
Specifies fields and values that refer to a geographical tag value for the SEMANTIC_CATEGORY tag.
privacy_category
Specifies the privacy category tag value.
The possible values are
IDENTIFIER
,QUASI-IDENTIFIER
andSENSITIVE
.semantic_category
Specifies the semantic category tag value.
For possible tag values, see System tags and categories and System tags and categories.
valid_value_ratio
Specifies the ratio of valid values in the sample size. Invalid values include NULL, an empty string, and a string with more than 256 characters.
Usage notes¶
Snowflake-provided stored procedures utilize caller’s rights. For more details, see Understanding caller’s rights and owner’s rights stored procedures.
If you want to apply alternate system tag values, use an ALTER TABLE … MODIFY COLUMN … SET TAG statement to update the tag value.
To unset a Classification system tag from a column, use an ALTER TABLE … MODIFY COLUMN … UNSET TAG statement.
Examples¶
Classify a table:
CALL SYSTEM$CLASSIFY('hr.tables.empl_info', null);
Classify a table and specify the number of rows to sample:
CALL SYSTEM$CLASSIFY('hr.tables.empl_info', {'sample_count': 1000});
Classify a table and set the system tags to the columns:
CALL SYSTEM$CLASSIFY('hr.tables.empl_info', {'auto_tag': true});
Classify a table, and specify the number of rows to sample and set the recommended system tag to each column in the table:
CALL SYSTEM$CLASSIFY('hr.tables.empl_info', {'sample_count': 1000, 'auto_tag': true});