snowflake.ml.modeling.impute.SimpleImputer

class snowflake.ml.modeling.impute.SimpleImputer(*, missing_values: Optional[Union[int, float, str, float64]] = nan, strategy: Optional[str] = 'mean', fill_value: Optional[Union[str, float]] = None, input_cols: Optional[Union[str, Iterable[str]]] = None, output_cols: Optional[Union[str, Iterable[str]]] = None, passthrough_cols: Optional[Union[str, Iterable[str]]] = None, drop_input_cols: Optional[bool] = False)

Bases: BaseTransformer

Univariate imputer for completing missing values with simple strategies. Note that the add_indicator parameter is not implemented. For more details on this class, see sklearn.impute.SimpleImputer.

Args:
missing_values: int, float, str, np.nan or None, default=np.nan.

The values to treat as missing and impute during transform.

strategy: str, default=”mean”.

The imputation strategy.

  • If “mean”, replace missing values using the mean along each column. Can only be used with numeric data.

  • If “median”, replace missing values using the median along each column. Can only be used with numeric data.

  • If “most_frequent”, replace missing using the most frequent value along each column. Can be used with strings or numeric data. If there is more than one such value, only the smallest is returned.

  • If “constant”, replace the missing values with fill_value. Can be used with strings or numeric data.

fill_value: Optional[str]

When strategy == “constant”, fill_value is used to replace all occurrences of missing_values. For string or object data types, fill_value must be a string. If None, fill_value will be 0 when imputing numerical data and missing_value for strings and object data types.

input_cols: Optional[Union[str, List[str]]]

Columns to use as inputs during fit and transform.

output_cols: Optional[Union[str, List[str]]]

A string or list of strings representing column names that will store the output of transform operation. The length of output_cols must equal the length of input_cols.

passthrough_cols: A string or a list of strings indicating column names to be excluded from any

operations (such as train, transform, or inference). These specified column(s) will remain untouched throughout the process. This option is helpful in scenarios requiring automatic input_cols inference, but need to avoid using specific columns, like index columns, during training or inference.

drop_input_cols: bool, default=False

Remove input columns from output if set True.

Attributes:
statistics_: dict {input_col: stats_value}

Dict containing the imputation fill value for each feature. Computing statistics can result in np.nan values. During transform, features corresponding to np.nan statistics will be discarded.

n_features_in_: int

Number of features seen during fit.

feature_names_in_: ndarray of shape (n_features_in,)

Names of features seen during fit.

Raises:

SnowflakeMLException: If strategy is invalid, or if fill value is specified for strategy that isn’t “constant”.

Methods

fit(dataset)

Compute values to impute for the dataset according to the strategy.

get_input_cols()

Input columns getter.

get_label_cols()

Label column getter.

get_output_cols()

Output columns getter.

get_params([deep])

Get parameters for this transformer.

get_passthrough_cols()

Passthrough columns getter.

get_sample_weight_col()

Sample weight column getter.

get_sklearn_args([default_sklearn_obj, ...])

Get sklearn keyword arguments.

set_drop_input_cols([drop_input_cols])

set_input_cols(input_cols)

Input columns setter.

set_label_cols(label_cols)

Label column setter.

set_output_cols(output_cols)

Output columns setter.

set_params(**params)

Set the parameters of this transformer.

set_passthrough_cols(passthrough_cols)

Passthrough columns setter.

set_sample_weight_col(sample_weight_col)

Sample weight column setter.

to_lightgbm()

to_sklearn()

to_xgboost()

transform(dataset)

Transform the input dataset by imputing the computed statistics in the input columns.