snowflake.ml.modeling.preprocessing.KBinsDiscretizer

class snowflake.ml.modeling.preprocessing.KBinsDiscretizer(*, n_bins: Union[int, List[int]] = 5, encode: str = 'onehot', strategy: str = 'quantile', input_cols: Optional[Union[str, Iterable[str]]] = None, output_cols: Optional[Union[str, Iterable[str]]] = None, passthrough_cols: Optional[Union[str, Iterable[str]]] = None, drop_input_cols: Optional[bool] = False)

Bases: BaseTransformer

Bin continuous data into intervals.

Args:
n_bins: int or array-like of shape (n_features,), default=5

The number of bins to produce. Raises ValueError if n_bins < 2.

encode: {‘onehot’, ‘onehot-dense’, ‘ordinal’}, default=’onehot’

Method used to encode the transformed result.

  • ‘onehot’: Encode the transformed result with one-hot encoding and return a sparse representation.

  • ‘onehot-dense’: Encode the transformed result with one-hot encoding and return separate column for

    each encoded value.

  • ‘ordinal’: Return the bin identifier encoded as an integer value.

strategy: {‘uniform’, ‘quantile’}, default=’quantile’

Strategy used to define the widths of the bins.

  • ‘uniform’: All bins in each feature have identical widths.

  • ‘quantile’: All bins in each feature have the same number of points.

input_cols: str or Iterable [column_name], default=None

Single or multiple input columns.

output_cols: str or Iterable [column_name], default=None

Single or multiple output columns.

passthrough_cols: A string or a list of strings indicating column names to be excluded from any

operations (such as train, transform, or inference). These specified column(s) will remain untouched throughout the process. This option is helpful in scenarios requiring automatic input_cols inference, but need to avoid using specific columns, like index columns, during training or inference.

drop_input_cols: boolean, default=False

Remove input columns from output if set True.

Attributes:
bin_edges_: ndarray of ndarray of shape (n_features,)

The edges of each bin. Contain arrays of varying shapes (n_bins_, )

n_bins_: ndarray of shape (n_features,), dtype=np.int_

Number of bins per feature.

Methods

fit(dataset)

Fit KBinsDiscretizer with dataset.

get_input_cols()

Input columns getter.

get_label_cols()

Label column getter.

get_output_cols()

Get output column names.

get_params([deep])

Get parameters for this transformer.

get_passthrough_cols()

Passthrough columns getter.

get_sample_weight_col()

Sample weight column getter.

get_sklearn_args([default_sklearn_obj, ...])

Get sklearn keyword arguments.

set_drop_input_cols([drop_input_cols])

set_input_cols(input_cols)

Input columns setter.

set_label_cols(label_cols)

Label column setter.

set_output_cols(output_cols)

Output columns setter.

set_params(**params)

Set the parameters of this transformer.

set_passthrough_cols(passthrough_cols)

Passthrough columns setter.

set_sample_weight_col(sample_weight_col)

Sample weight column setter.

to_lightgbm()

to_sklearn()

to_xgboost()

transform(dataset)

Discretize the data.