snowflake.ml.modeling.preprocessing.RobustScaler

class snowflake.ml.modeling.preprocessing.RobustScaler(*, with_centering: bool = True, with_scaling: bool = True, quantile_range: Tuple[float, float] = (25.0, 75.0), unit_variance: bool = False, input_cols: Optional[Union[str, Iterable[str]]] = None, output_cols: Optional[Union[str, Iterable[str]]] = None, passthrough_cols: Optional[Union[str, Iterable[str]]] = None, drop_input_cols: Optional[bool] = False)

Bases: BaseTransformer

Scales features using statistics that are robust to outliers. Values must be of float type.

For more details on what this transformer does, see sklearn.preprocessing.RobustScaler.

Args:
with_centering: bool, default=True

If True, center the data around zero before scaling.

with_scaling: bool, default=True

If True, scale the data to interquartile range.

quantile_range: Tuple[float, float], default=(25.0, 75.0)

tuple like (q_min, q_max), where 0.0 < q_min < q_max < 100.0, default=(25.0, 75.0). Quantile range used to calculate scale_. By default, this is equal to the IQR, i.e., q_min is the first quantile and q_max is the third quantile.

unit_variance: bool, default=False

If True, scale data so that normally-distributed features have a variance of 1. In general, if the difference between the x-values of q_max and q_min for a standard normal distribution is greater than 1, the dataset is scaled down. If less than 1, the dataset is scaled up.

input_cols: Optional[Union[str, List[str]]], default=None

The name(s) of one or more columns in a DataFrame containing a feature to be scaled.

output_cols: Optional[Union[str, List[str]]], default=None

The name(s) of one or more columns in a DataFrame in which results will be stored. The number of columns specified must match the number of input columns. For dense output, the column names specified are used as base names for the columns created for each category.

passthrough_cols: Optional[Union[str, List[str]]], default=None

A string or a list of strings indicating column names to be excluded from any operations (such as train, transform, or inference). These specified column(s) will remain untouched throughout the process. This option is helpful in scenarios requiring automatic input_cols inference, but need to avoid using specific columns, like index columns, during training or inference.

drop_input_cols: Optional[bool], default=False

Remove input columns from output if set True. False by default.

Attributes:
center_: Dict[str, float]

Dictionary mapping input column name to the median value for that feature.

scale_: Dict[str, float]

Dictionary mapping input column name to the (scaled) interquartile range for that feature.

Methods

fit(dataset)

Compute center, scale and quantile values of the dataset.

get_input_cols()

Input columns getter.

get_label_cols()

Label column getter.

get_output_cols()

Output columns getter.

get_params([deep])

Get parameters for this transformer.

get_passthrough_cols()

Passthrough columns getter.

get_sample_weight_col()

Sample weight column getter.

get_sklearn_args([default_sklearn_obj, ...])

Get sklearn keyword arguments.

set_drop_input_cols([drop_input_cols])

set_input_cols(input_cols)

Input columns setter.

set_label_cols(label_cols)

Label column setter.

set_output_cols(output_cols)

Output columns setter.

set_params(**params)

Set the parameters of this transformer.

set_passthrough_cols(passthrough_cols)

Passthrough columns setter.

set_sample_weight_col(sample_weight_col)

Sample weight column setter.

to_lightgbm()

to_sklearn()

to_xgboost()

transform(dataset)

Center and scale the data.

Attributes

center_

scale_