You are viewing documentation about an older version (1.0.9). View latest version

snowflake.ml.modeling.preprocessing.KBinsDiscretizerΒΆ

class snowflake.ml.modeling.preprocessing.KBinsDiscretizer(*, n_bins: Union[int, List[int]] = 5, encode: str = 'onehot', strategy: str = 'quantile', input_cols: Optional[Union[str, Iterable[str]]] = None, output_cols: Optional[Union[str, Iterable[str]]] = None, drop_input_cols: Optional[bool] = False)ΒΆ

Bases: BaseTransformer

Bin continuous data into intervals.

Args:
n_bins: int or array-like of shape (n_features,), default=5

The number of bins to produce. Raises ValueError if n_bins < 2.

encode: {β€˜onehot’, β€˜onehot-dense’, β€˜ordinal’}, default=’onehot’

Method used to encode the transformed result.

  • β€˜onehot’: Encode the transformed result with one-hot encoding and return a sparse representation.

  • β€˜onehot-dense’: Encode the transformed result with one-hot encoding and return separate column for

    each encoded value.

  • β€˜ordinal’: Return the bin identifier encoded as an integer value.

strategy: {β€˜uniform’, β€˜quantile’}, default=’quantile’

Strategy used to define the widths of the bins.

  • β€˜uniform’: All bins in each feature have identical widths.

  • β€˜quantile’: All bins in each feature have the same number of points.

input_cols: str or Iterable [column_name], default=None

Single or multiple input columns.

output_cols: str or Iterable [column_name], default=None

Single or multiple output columns.

drop_input_cols: boolean, default=False

Remove input columns from output if set True.

Attributes:
bin_edges_: ndarray of ndarray of shape (n_features,)

The edges of each bin. Contain arrays of varying shapes (n_bins_, )

n_bins_: ndarray of shape (n_features,), dtype=np.int_

Number of bins per feature.

Methods

fit(dataset)

Fit KBinsDiscretizer with dataset.

get_output_cols()

Get output column names.

transform(dataset)

Discretize the data.