snowflake.ml.modeling.metrics.roc_auc_score¶
- snowflake.ml.modeling.metrics.roc_auc_score(*, df: DataFrame, y_true_col_names: Union[str, List[str]], y_score_col_names: Union[str, List[str]], average: Optional[str] = 'macro', sample_weight_col_name: Optional[str] = None, max_fpr: Optional[float] = None, multi_class: str = 'raise', labels: Optional[Union[_SupportsArray[dtype], _NestedSequence[_SupportsArray[dtype]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]] = None) Union[float, ndarray[Any, dtype[float64]]] ¶
Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.
Note: this implementation can be used with binary, multiclass and multilabel classification, but some restrictions apply.
- Args:
df: Input dataframe. y_true_col_names: Column name(s) representing true labels or binary label indicators.
The binary and multiclass cases expect labels with shape (n_samples,) while the multilabel case expects binary label indicators with shape (n_samples, n_classes).
- y_score_col_names: Column name(s) representing target scores.
In the binary case, it corresponds to an array of shape (n_samples,). Both probability estimates and non-thresholded decision values can be provided. The probability estimates correspond to the probability of the class with the greater label. The decision values correspond to the output of estimator.decision_function.
In the multiclass case, it corresponds to an array of shape (n_samples, n_classes) of probability estimates provided by the predict_proba method. The probability estimates must sum to 1 across the possible classes. In addition, the order of the class scores must correspond to the order of
labels
, if provided, or else to the numerical or lexicographical order of the labels iny_true
.In the multilabel case, it corresponds to an array of shape (n_samples, n_classes). Probability estimates are provided by the predict_proba method and the non-thresholded decision values by the decision_function method. The probability estimates correspond to the probability of the class with the greater label for each output of the classifier.
- average: {‘micro’, ‘macro’, ‘samples’, ‘weighted’} or None, default=’macro’
If
None
, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data. Note: multiclass ROC AUC currently only handles the ‘macro’ and ‘weighted’ averages. For multiclass targets, average=None is only implemented for multi_class=’ovr’ and average=’micro’ is only implemented for multi_class=’ovr’.'micro'
Calculate metrics globally by considering each element of the label indicator matrix as a label.
'macro'
Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
'weighted'
Calculate metrics for each label, and find their average, weighted by support (the number of true instances for each label).
'samples'
Calculate metrics for each instance, and find their average.
Will be ignored when
y_true
is binary.
sample_weight_col_name: Column name representing sample weights. max_fpr: float > 0 and <= 1, default=None
If not
None
, the standardized partial AUC [2]_ over the range [0, max_fpr] is returned. For the multiclass case,max_fpr
, should be either equal toNone
or1.0
as AUC ROC partial computation currently is not supported for multiclass.- multi_class: {‘raise’, ‘ovr’, ‘ovo’}, default=’raise’
Only used for multiclass targets. Determines the type of configuration to use. The default value raises an error, so either
'ovr'
or'ovo'
must be passed explicitly.'ovr'
'ovo'
Stands for One-vs-one. Computes the average AUC of all possible pairwise combinations of classes [5]_. Insensitive to class imbalance when
average == 'macro'
.
- labels: Only used for multiclass targets. List of labels that index the
classes in
y_score
. IfNone
, the numerical or lexicographical order of the labels iny_true
is used.
- Returns:
Area Under the Curve score.