You are viewing documentation about an older version (1.0.9). View latest version

snowflake.ml.modeling.linear_model.LogisticRegression

class snowflake.ml.modeling.linear_model.LogisticRegression(*, penalty='l2', dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver='lbfgs', max_iter=100, multi_class='auto', verbose=0, warm_start=False, n_jobs=None, l1_ratio=None, input_cols: Optional[Union[str, Iterable[str]]] = None, output_cols: Optional[Union[str, Iterable[str]]] = None, label_cols: Optional[Union[str, Iterable[str]]] = None, drop_input_cols: Optional[bool] = False, sample_weight_col: Optional[str] = None)

Bases: BaseTransformer

Logistic Regression (aka logit, MaxEnt) classifier For more details on this class, see sklearn.linear_model.LogisticRegression

penalty: {‘l1’, ‘l2’, ‘elasticnet’, None}, default=’l2’

Specify the norm of the penalty:

  • None: no penalty is added;

  • ‘l2’: add a L2 penalty term and it is the default choice;

  • ‘l1’: add a L1 penalty term;

  • ‘elasticnet’: both L1 and L2 penalty terms are added.

dual: bool, default=False

Dual or primal formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer dual=False when n_samples > n_features.

tol: float, default=1e-4

Tolerance for stopping criteria.

C: float, default=1.0

Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.

fit_intercept: bool, default=True

Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.

intercept_scaling: float, default=1

Useful only when the solver ‘liblinear’ is used and self.fit_intercept is set to True. In this case, x becomes [x, self.intercept_scaling], i.e. a “synthetic” feature with constant value equal to intercept_scaling is appended to the instance vector. The intercept becomes intercept_scaling * synthetic_feature_weight.

Note! the synthetic feature weight is subject to l1/l2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased.

class_weight: dict or ‘balanced’, default=None

Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one.

The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)).

Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.

random_state: int, RandomState instance, default=None

Used when solver == ‘sag’, ‘saga’ or ‘liblinear’ to shuffle the data. See Glossary for details.

solver: {‘lbfgs’, ‘liblinear’, ‘newton-cg’, ‘newton-cholesky’, ‘sag’, ‘saga’}, default=’lbfgs’

Algorithm to use in the optimization problem. Default is ‘lbfgs’. To choose a solver, you might want to consider the following aspects:

  • For small datasets, ‘liblinear’ is a good choice, whereas ‘sag’ and ‘saga’ are faster for large ones;

  • For multiclass problems, only ‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’ handle multinomial loss;

  • ‘liblinear’ is limited to one-versus-rest schemes.

  • ‘newton-cholesky’ is a good choice for n_samples >> n_features, especially with one-hot encoded categorical features with rare categories. Note that it is limited to binary classification and the one-versus-rest reduction for multiclass classification. Be aware that the memory usage of this solver has a quadratic dependency on n_features because it explicitly computes the Hessian matrix.

  • ‘lbfgs’ - [‘l2’, None]

  • ‘liblinear’ - [‘l1’, ‘l2’]

  • ‘newton-cg’ - [‘l2’, None]

  • ‘newton-cholesky’ - [‘l2’, None]

  • ‘sag’ - [‘l2’, None]

  • ‘saga’ - [‘elasticnet’, ‘l1’, ‘l2’, None]

max_iter: int, default=100

Maximum number of iterations taken for the solvers to converge.

multi_class: {‘auto’, ‘ovr’, ‘multinomial’}, default=’auto’

If the option chosen is ‘ovr’, then a binary problem is fit for each label. For ‘multinomial’ the loss minimised is the multinomial loss fit across the entire probability distribution, even when the data is binary. ‘multinomial’ is unavailable when solver=’liblinear’. ‘auto’ selects ‘ovr’ if the data is binary, or if solver=’liblinear’, and otherwise selects ‘multinomial’.

verbose: int, default=0

For the liblinear and lbfgs solvers set verbose to any positive number for verbosity.

warm_start: bool, default=False

When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. Useless for liblinear solver. See the Glossary.

n_jobs: int, default=None

Number of CPU cores used when parallelizing over classes if multi_class=’ovr’”. This parameter is ignored when the solver is set to ‘liblinear’ regardless of whether ‘multi_class’ is specified or not. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

l1_ratio: float, default=None

The Elastic-Net mixing parameter, with 0 <= l1_ratio <= 1. Only used if penalty='elasticnet'. Setting l1_ratio=0 is equivalent to using penalty='l2', while setting l1_ratio=1 is equivalent to using penalty='l1'. For 0 < l1_ratio <1, the penalty is a combination of L1 and L2.

input_cols: Optional[Union[str, List[str]]]

A string or list of strings representing column names that contain features. If this parameter is not specified, all columns in the input DataFrame except the columns specified by label_cols and sample-weight_col parameters are considered input columns.

label_cols: Optional[Union[str, List[str]]]

A string or list of strings representing column names that contain labels. This is a required param for estimators, as there is no way to infer these columns. If this parameter is not specified, then object is fitted without labels(Like a transformer).

output_cols: Optional[Union[str, List[str]]]

A string or list of strings representing column names that will store the output of predict and transform operations. The length of output_cols mus match the expected number of output columns from the specific estimator or transformer class used. If this parameter is not specified, output column names are derived by adding an OUTPUT_ prefix to the label column names. These inferred output column names work for estimator’s predict() method, but output_cols must be set explicitly for transformers.

sample_weight_col: Optional[str]

A string representing the column name containing the examples’ weights. This argument is only required when working with weighted datasets.

drop_input_cols: Optional[bool], default=False

If set, the response of predict(), transform() methods will not contain input columns.

Methods

decision_function(dataset[, output_cols_prefix])

Predict confidence scores for samples For more details on this function, see sklearn.linear_model.LogisticRegression.decision_function

fit(dataset)

Fit the model according to the given training data For more details on this function, see sklearn.linear_model.LogisticRegression.fit

predict(dataset)

Predict class labels for samples in X For more details on this function, see sklearn.linear_model.LogisticRegression.predict

predict_log_proba(dataset[, output_cols_prefix])

Probability estimates For more details on this function, see sklearn.linear_model.LogisticRegression.predict_proba

predict_proba(dataset[, output_cols_prefix])

Probability estimates For more details on this function, see sklearn.linear_model.LogisticRegression.predict_proba

score(dataset)

Return the mean accuracy on the given test data and labels For more details on this function, see sklearn.linear_model.LogisticRegression.score

set_input_cols(input_cols)

Input columns setter.

to_sklearn()

Get sklearn.linear_model.LogisticRegression object.

Attributes

model_signatures

Returns model signature of current class.