snowflake.ml.modeling.linear_model.LogisticRegression¶
- class snowflake.ml.modeling.linear_model.LogisticRegression(*, penalty='l2', dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver='lbfgs', max_iter=100, multi_class='auto', verbose=0, warm_start=False, n_jobs=None, l1_ratio=None, input_cols: Optional[Union[str, Iterable[str]]] = None, output_cols: Optional[Union[str, Iterable[str]]] = None, label_cols: Optional[Union[str, Iterable[str]]] = None, drop_input_cols: Optional[bool] = False, sample_weight_col: Optional[str] = None)¶
Bases:
BaseTransformer
Logistic Regression (aka logit, MaxEnt) classifier For more details on this class, see sklearn.linear_model.LogisticRegression
- penalty: {‘l1’, ‘l2’, ‘elasticnet’, None}, default=’l2’
Specify the norm of the penalty:
None: no penalty is added;
‘l2’: add a L2 penalty term and it is the default choice;
‘l1’: add a L1 penalty term;
‘elasticnet’: both L1 and L2 penalty terms are added.
- dual: bool, default=False
Dual or primal formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer dual=False when n_samples > n_features.
- tol: float, default=1e-4
Tolerance for stopping criteria.
- C: float, default=1.0
Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.
- fit_intercept: bool, default=True
Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.
- intercept_scaling: float, default=1
Useful only when the solver ‘liblinear’ is used and self.fit_intercept is set to True. In this case, x becomes [x, self.intercept_scaling], i.e. a “synthetic” feature with constant value equal to intercept_scaling is appended to the instance vector. The intercept becomes
intercept_scaling * synthetic_feature_weight
.Note! the synthetic feature weight is subject to l1/l2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased.
- class_weight: dict or ‘balanced’, default=None
Weights associated with classes in the form
{class_label: weight}
. If not given, all classes are supposed to have weight one.The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as
n_samples / (n_classes * np.bincount(y))
.Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.
- random_state: int, RandomState instance, default=None
Used when
solver
== ‘sag’, ‘saga’ or ‘liblinear’ to shuffle the data. See Glossary for details.
solver: {‘lbfgs’, ‘liblinear’, ‘newton-cg’, ‘newton-cholesky’, ‘sag’, ‘saga’}, default=’lbfgs’
Algorithm to use in the optimization problem. Default is ‘lbfgs’. To choose a solver, you might want to consider the following aspects:
For small datasets, ‘liblinear’ is a good choice, whereas ‘sag’ and ‘saga’ are faster for large ones;
For multiclass problems, only ‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’ handle multinomial loss;
‘liblinear’ is limited to one-versus-rest schemes.
‘newton-cholesky’ is a good choice for n_samples >> n_features, especially with one-hot encoded categorical features with rare categories. Note that it is limited to binary classification and the one-versus-rest reduction for multiclass classification. Be aware that the memory usage of this solver has a quadratic dependency on n_features because it explicitly computes the Hessian matrix.
‘lbfgs’ - [‘l2’, None]
‘liblinear’ - [‘l1’, ‘l2’]
‘newton-cg’ - [‘l2’, None]
‘newton-cholesky’ - [‘l2’, None]
‘sag’ - [‘l2’, None]
‘saga’ - [‘elasticnet’, ‘l1’, ‘l2’, None]
- max_iter: int, default=100
Maximum number of iterations taken for the solvers to converge.
- multi_class: {‘auto’, ‘ovr’, ‘multinomial’}, default=’auto’
If the option chosen is ‘ovr’, then a binary problem is fit for each label. For ‘multinomial’ the loss minimised is the multinomial loss fit across the entire probability distribution, even when the data is binary. ‘multinomial’ is unavailable when solver=’liblinear’. ‘auto’ selects ‘ovr’ if the data is binary, or if solver=’liblinear’, and otherwise selects ‘multinomial’.
- verbose: int, default=0
For the liblinear and lbfgs solvers set verbose to any positive number for verbosity.
- warm_start: bool, default=False
When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. Useless for liblinear solver. See the Glossary.
- n_jobs: int, default=None
Number of CPU cores used when parallelizing over classes if multi_class=’ovr’”. This parameter is ignored when the
solver
is set to ‘liblinear’ regardless of whether ‘multi_class’ is specified or not.None
means 1 unless in ajoblib.parallel_backend
context.-1
means using all processors. See Glossary for more details.- l1_ratio: float, default=None
The Elastic-Net mixing parameter, with
0 <= l1_ratio <= 1
. Only used ifpenalty='elasticnet'
. Settingl1_ratio=0
is equivalent to usingpenalty='l2'
, while settingl1_ratio=1
is equivalent to usingpenalty='l1'
. For0 < l1_ratio <1
, the penalty is a combination of L1 and L2.- input_cols: Optional[Union[str, List[str]]]
A string or list of strings representing column names that contain features. If this parameter is not specified, all columns in the input DataFrame except the columns specified by label_cols and sample-weight_col parameters are considered input columns.
- label_cols: Optional[Union[str, List[str]]]
A string or list of strings representing column names that contain labels. This is a required param for estimators, as there is no way to infer these columns. If this parameter is not specified, then object is fitted without labels(Like a transformer).
- output_cols: Optional[Union[str, List[str]]]
A string or list of strings representing column names that will store the output of predict and transform operations. The length of output_cols mus match the expected number of output columns from the specific estimator or transformer class used. If this parameter is not specified, output column names are derived by adding an OUTPUT_ prefix to the label column names. These inferred output column names work for estimator’s predict() method, but output_cols must be set explicitly for transformers.
- sample_weight_col: Optional[str]
A string representing the column name containing the examples’ weights. This argument is only required when working with weighted datasets.
- drop_input_cols: Optional[bool], default=False
If set, the response of predict(), transform() methods will not contain input columns.
Methods
decision_function
(dataset[, output_cols_prefix])Predict confidence scores for samples For more details on this function, see sklearn.linear_model.LogisticRegression.decision_function
fit
(dataset)Fit the model according to the given training data For more details on this function, see sklearn.linear_model.LogisticRegression.fit
predict
(dataset)Predict class labels for samples in X For more details on this function, see sklearn.linear_model.LogisticRegression.predict
predict_log_proba
(dataset[, output_cols_prefix])Probability estimates For more details on this function, see sklearn.linear_model.LogisticRegression.predict_proba
predict_proba
(dataset[, output_cols_prefix])Probability estimates For more details on this function, see sklearn.linear_model.LogisticRegression.predict_proba
score
(dataset)Return the mean accuracy on the given test data and labels For more details on this function, see sklearn.linear_model.LogisticRegression.score
set_input_cols
(input_cols)Input columns setter.
to_sklearn
()Get sklearn.linear_model.LogisticRegression object.
Attributes
model_signatures
Returns model signature of current class.