snowflake.ml.modeling.preprocessing.KBinsDiscretizerΒΆ
- class snowflake.ml.modeling.preprocessing.KBinsDiscretizer(*, n_bins: Union[int, List[int]] = 5, encode: str = 'onehot', strategy: str = 'quantile', input_cols: Optional[Union[str, Iterable[str]]] = None, output_cols: Optional[Union[str, Iterable[str]]] = None, passthrough_cols: Optional[Union[str, Iterable[str]]] = None, drop_input_cols: Optional[bool] = False)ΒΆ
Bases:
BaseTransformer
Bin continuous data into intervals.
- Args:
- n_bins: int or array-like of shape (n_features,), default=5
The number of bins to produce. Raises ValueError if n_bins < 2.
- encode: {βonehotβ, βonehot-denseβ, βordinalβ}, default=βonehotβ
Method used to encode the transformed result.
βonehotβ: Encode the transformed result with one-hot encoding and return a sparse representation.
- βonehot-denseβ: Encode the transformed result with one-hot encoding and return separate column for
each encoded value.
βordinalβ: Return the bin identifier encoded as an integer value.
- strategy: {βuniformβ, βquantileβ}, default=βquantileβ
Strategy used to define the widths of the bins.
βuniformβ: All bins in each feature have identical widths.
βquantileβ: All bins in each feature have the same number of points.
- input_cols: str or Iterable [column_name], default=None
Single or multiple input columns.
- output_cols: str or Iterable [column_name], default=None
Single or multiple output columns.
- passthrough_cols: A string or a list of strings indicating column names to be excluded from any
operations (such as train, transform, or inference). These specified column(s) will remain untouched throughout the process. This option is helpful in scenarios requiring automatic input_cols inference, but need to avoid using specific columns, like index columns, during training or inference.
- drop_input_cols: boolean, default=False
Remove input columns from output if set True.
- Attributes:
- bin_edges_: ndarray of ndarray of shape (n_features,)
The edges of each bin. Contain arrays of varying shapes (n_bins_, )
- n_bins_: ndarray of shape (n_features,), dtype=np.int_
Number of bins per feature.
Methods
fit
(dataset)Fit KBinsDiscretizer with dataset.
get_input_cols
()Input columns getter.
get_label_cols
()Label column getter.
get_output_cols
()Get output column names.
get_params
([deep])Get parameters for this transformer.
get_passthrough_cols
()Passthrough columns getter.
get_sample_weight_col
()Sample weight column getter.
get_sklearn_args
([default_sklearn_obj, ...])Get sklearn keyword arguments.
set_drop_input_cols
([drop_input_cols])set_input_cols
(input_cols)Input columns setter.
set_label_cols
(label_cols)Label column setter.
set_output_cols
(output_cols)Output columns setter.
set_params
(**params)Set the parameters of this transformer.
set_passthrough_cols
(passthrough_cols)Passthrough columns setter.
set_sample_weight_col
(sample_weight_col)Sample weight column setter.
to_lightgbm
()to_sklearn
()to_xgboost
()transform
(dataset)Discretize the data.