You are viewing documentation about an older version (1.0.9). View latest version

snowflake.ml.modeling.metrics.log_lossΒΆ

snowflake.ml.modeling.metrics.log_loss(*, df: DataFrame, y_true_col_names: Union[str, List[str]], y_pred_col_names: Union[str, List[str]], eps: Union[float, str] = 'auto', normalize: bool = True, sample_weight_col_name: Optional[str] = None, labels: Optional[Union[_SupportsArray[dtype], _NestedSequence[_SupportsArray[dtype]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]] = None) β†’ floatΒΆ

Log loss, aka logistic loss or cross-entropy loss.

This is the loss function used in (multinomial) logistic regression and extensions of it such as neural networks, defined as the negative log-likelihood of a logistic model that returns y_pred probabilities for its training data y_true. The log loss is only defined for two or more labels. For a single sample with true label \(y \in \{0,1\}\) and a probability estimate \(p = \operatorname{Pr}(y = 1)\), the log loss is:

\[L_{\log}(y, p) = -(y \log (p) + (1 - y) \log (1 - p))\]
Args:

df: Input dataframe. y_true_col_names: Column name(s) representing actual values. y_pred_col_names: Column name(s) representing predicted probabilities,

as returned by a classifier’s predict_proba method. If y_pred.shape = (n_samples,) the probabilities provided are assumed to be that of the positive class. The labels in y_pred are assumed to be ordered alphabetically, as done by LabelBinarizer.

eps: float or β€œauto”, default=”auto”

Log loss is undefined for p=0 or p=1, so probabilities are clipped to max(eps, min(1 - eps, p)). The default will depend on the data type of y_pred and is set to np.finfo(y_pred.dtype).eps.

normalize: If true, return the mean loss per sample.

Otherwise, return the sum of the per-sample losses.

sample_weight_col_name: Column name representing sample weights. labels: If not provided, labels will be inferred from y_true. If labels

is None and y_pred has shape (n_samples,) the labels are assumed to be binary and are inferred from y_true.

Returns:

Log loss, aka logistic loss or cross-entropy loss.