snowflake.ml.modeling.preprocessing.KBinsDiscretizerΒΆ
- class snowflake.ml.modeling.preprocessing.KBinsDiscretizer(*, n_bins: Union[int, List[int]] = 5, encode: str = 'onehot', strategy: str = 'quantile', input_cols: Optional[Union[str, Iterable[str]]] = None, output_cols: Optional[Union[str, Iterable[str]]] = None, drop_input_cols: Optional[bool] = False)ΒΆ
Bases:
BaseTransformer
Bin continuous data into intervals.
- Args:
- n_bins: int or array-like of shape (n_features,), default=5
The number of bins to produce. Raises ValueError if n_bins < 2.
- encode: {βonehotβ, βonehot-denseβ, βordinalβ}, default=βonehotβ
Method used to encode the transformed result.
βonehotβ: Encode the transformed result with one-hot encoding and return a sparse representation.
- βonehot-denseβ: Encode the transformed result with one-hot encoding and return separate column for
each encoded value.
βordinalβ: Return the bin identifier encoded as an integer value.
- strategy: {βuniformβ, βquantileβ}, default=βquantileβ
Strategy used to define the widths of the bins.
βuniformβ: All bins in each feature have identical widths.
βquantileβ: All bins in each feature have the same number of points.
- input_cols: str or Iterable [column_name], default=None
Single or multiple input columns.
- output_cols: str or Iterable [column_name], default=None
Single or multiple output columns.
- drop_input_cols: boolean, default=False
Remove input columns from output if set True.
- Attributes:
- bin_edges_: ndarray of ndarray of shape (n_features,)
The edges of each bin. Contain arrays of varying shapes (n_bins_, )
- n_bins_: ndarray of shape (n_features,), dtype=np.int_
Number of bins per feature.
Methods
fit
(dataset)Fit KBinsDiscretizer with dataset.
get_output_cols
()Get output column names.
transform
(dataset)Discretize the data.